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ABSTRACT 

This  thesis  reports  the  results  of  digitally  simulating  cue star 
embedding  field  learning  networks.  An  out star  is  a  device  that  is 
capable  of  inductively  learning  to  associate  the  occurance  of  a 
command  event  with  a  pattern  of  events.  Once  this  association  is 
learned,  the  out star  will  reproduce  tha  pattern  whenever  the  command 
event  occurs, 

A  simple  outstar  was  studied.  It  was  found  that  a  fast  rate  for 
forgetting  accumulated  experience  is  necessary  to  maintain  control 
of  the  amplitudes  of  the  outstar' s  responses.   It  was  further  found 
that  a  fast  rate  for  forgetting  accumulated  experience  results  in 
poor  noise  resistance  but  good  adaptability.  A  slovr  forgetting  rate 
results  in  good  noise  resistance  but  poor  adaptability.  The  practical 
aspects  of  thresholds  was  studied, 

A  laterally  inhibiting  outstar  was  studied.  It  was  found  that 
the  active  process  of  lateral  inhibition  results  in  both  good  noise 
resistance  and  good  adaptability, 

A  short  study  of  outstar  avalanches  was  made.  An  outstar  avalanche 
is  a  cascade  of  out stars  which  can  learn  and  reproduce  time  varying 
patterns.   It  was  found  that  a  command  node  cascade  avalanche  does  not 
work  well  because  of  pulse  lengthening,  A  "long  axon  with  collaterals" 
avalanche  was  studied, 

A  virtual  laterally  inhibiting  outstar  was  studied, 

A  convenient  method  for  analyzing  new  formulations  for  the  learning 
process  in  an  outstar  was  developed,  •  A  "generalized"  learning  process 
was  developed  and.  studied. 

The  analogy  between  embedding  field  theory  and  the  nervous  system 
of  living  organisms  was  introduced.  The  theoretical  proposal  that 
learning  on  the  neurophysiological  level  is  due  to  the  production  of 
transmitter  in  a  synaptic  cleft  proportional  to  the  correlation  between 
presynaptic  and  posts2maptic  membrane  potentials  was  used  to  simplist- 
ically  model  a  learning  process  for  out stars. 
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CHAPTER  1      EMBEDDING  FIELD  NETWORKS 
section  1,1     Introduction 

Grossberg  has  developed  a  theory  for  learning  called  embedding 
field  theory.   (Refs.  1  -  10)  He  has  proposed  several  devices  designed 
in  accordance  vrlth.   this  theory  to  handle  broad  categories  of  learning 
phenomena.  These  devices  are  inductive  learning  machines  which  are 
governed  by  a  set  of  deterministic  equations.  He  has  qualitatively 
demonstrated  their  learning  abilities.  He  has  further  drawn  an 
analogy  between  embedding  field  theory  and  the  nervous  system  of 
living  organisms.  Based  on  this  analogy,  he  has  made  a  concrete 
proposal  for  the  neurophysiological  phenomena  underlying  learning 
in  living  organisms. 

By  means  of  a  digital  simulation,  this  thesis  experimentally 
studies  one  embedding  field  device  called  an  outstar,  and  it  will 
examine  a  combination  of  outstars  called  an  outstar  aValanche,  The 
analogy  between  the  nervous  system  of  living  organisms  and  embedding 
field  theory  will  be  introduced  and  examined. 

For  the  uninitiated,  we  will  begin  by  deriving  the  basic  concepts 
of  embedding  field  theory  from  intuitive  ideas  about  learning. 
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section  1,2     Illustrative  Derivation  of  an  Embedding  Field  Network 


o 


Embedding  Field  theory  is  a  mathematical  model  for  learning.  To 
gain  an  operational  appreciation  of  this  model t  consider  modeling  the 
following  learning  experiment: 

An  experimenter  teaches  a  subject  an  arbitrary  time  sequential  list 
of  letters  of  the  alphabet  by  saying  the  list  to  the  subject  several 
timeso  At  the  end  of  this  instruction,  the  subject  is  requested  to 
repest  the  list*  If  he  can,  then  it  is  concluded  that  he  has  learned 
the  list. 

In  order  for  the  subject  to  leam  the  list,  the  letters  composing 
the  list  must  be  familiar  to  him  and  must  appear  to  be  separate  events. 
One  of  the  tasks  of  this  experiment  will  be  to  teach  the  subject  to 
combine  the  separate  letters  of  the  alphabet  into  a  net;  event  which  is 
the  list.  We  expect  that  after  instruction,  presentation  of  the  first 
letter  of  the  list  will  automatically  result  in  the  subject  expecting  to 
hear  the  succeeding  letters  of  this  list. 

We  begin  our  description  by  modeling  the  subject's  state  bofore  tho 
experiment  has  begun.  He  is  familiar  with  the  letters  of  the  alphabet 
and  recognizes  them  as  separate  events.  We  model  this  by  assigning  a 
distinct  point  in  space  to  each  letter  of  the  alphabet  and  calling 
these  points  nodes,  To  denote  recognition  of  a  letter  of  the  alphabet, 
A^,  we  assign  a  time  varying  process  x-(t)  to  each  node  V  ,  x.(t)  has 
the  properties: 

(a)  x.(t)~  0  when  the  letter  A.  has  not  been  presented  to  the 
subject  recently, 

(b)  x, (t)  y-   0  when  the  letter  A.  has  been  presented  to  the  subject 
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recently,, 

As  x.(t)  indicates  only  the  two  conditions  (a)  and  (b)  above,  we 
may  constrain  x.(t)  to  be  non  negative. 

We  raodel  the  experimenter's  ability  to  communicate  with  the  subject 
similarly.  When  the  experimenter  says  the  letter  A.  to  the  subject,  a 
non  negative  input  pulse  P^(t)  is  delivered  to  the  appropriate  node  V^ 
in  the  subject.  The  pulse  P.(t)  has  the  properties: 

(c)  Pj(t)  >  0  when  the  experiment er  says  A. . 

(d)  P±(t)  ~   0  all  other  times. 

It  will  require  a  small,  but  finite,  time  interval  for  the  experimen- 
ter to  say  A. .  P^(t)  is  non  zero  during  this  time  interval. 

We  are  now  in  a  position  to  write  a  differential  equation  for  x.(t): 

eqn  (1)     x^t)  =  -ax^t)   +  P±(t) 

Equation  (1)  was  chosen  to  model  the  response  of  V.  to  presentation 
of  the  letter  A.  because  it  is  the  simplest  continuous  representation  for 
x.(t)  satisfying  conditions  (a)  and  (b)  on  x. (t). 

The  experiment  is  now  begun.  The  experimenter  says  a  list  A^,  A?t 
,,.  An  to  the  subject.  There  will  be  a  time  interval,  w^,  between  the 
presentation  of  each  letter.  For  simplicity,  we  assume  that  these  time 
intervals  are  all  the  same. 

At  the  beginning  of  the  experiment  the  subject  has  no  idea  of  what 
the  experimenter's  list  is.  Therefore,  when  A.  is  presented  the  subject 
can  only  guess,  with  probability  1/26  of  success,  what  the  experimenter's 
selection  for  the  second  letter  A.  is.  This  carries  throughout  the  list. 
If  the  experimenter  has  presented  letter  A.,  the  subject  can  only  guess 
with  probability  i/26  of  success,  what  the  A.+-  letter  is, 
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However,  when  the  experimenter  presents  the  list  for  the  second  time, 
we  expect  the  subject  to  be  able  to  predict  the  succeeding  letters  of  this 
list  with  much  greater  accuracy.  When  the  subject  has  learned  the  list, 
he  will  be  able  to  predict  all  the  letters  in  the  list,  in  their  correct 
order,  with  certainty. 

Wo  must  now  model  this  process. 

Firstly,  we  have  said  that  the  subject  has  the  ability  to  predict 
what  the  succeeding  letters  of  the  list  are,  and  this  ability  becomes 
more  successful  after  each  presentation  of  the  list. 

Let  us  model  this  prediction  process  by  connecting  each  of  our  nodes 
to  every  other  node  with  transmission  lines  which  we  shall  call  edges. 
We  allow  the  signal  x.(t)  from  a  node  to  travel  away  from  that  node  along 
the  edges  to  each  of  the  other  nodes  where  it  can  act  as  input  to  these 
nodes.  The  actual  prediction  of  the  letter  following  the  A.  letter  is 
modeled  in  the  same  manner  as  awareness  of  a  letter  being  presented  by 
the  experimenter.  The  appropriate  x.(t)  process  is  excited  by  the  pre- 
diction signals  arriving  via  the  edge  from  V.. 

The  subject's  ability  to  only  blindly  guess  what  each  succeeding 
letter  is  when  the  list  is  first  presented  means  that  equal  prediction 
signals  are  reserved  at  all  nodes  at  the  beginning  of  the  experiment. 
His  ability  to  predict  the  entire  list  in  the  correct  order  after  learning 
means  that  after  excitation  of  the  V.  node,  a  prediction  signal  is  received 
only  at  the  correct  V.+*  node. 

Prediction  of  the  letters  of  the  list  in  their  correct  order  requires 
that  the  V .  node  be  excited  by  prediction  signals  before  the  V .  .  node. 
To  accomplish  this,  we  constrain  the  prediction  signals  traveling  along 
the  edges  to  a  finite  transmission  velocity.  That  is,  the  signal  x.(t) 


Figure  1.2.1.   Geometric  schematic  of  nodes  and  directed  edges. 
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originating  at  node  V.  arrives  at  node  V.+^  after  a  time  delay  of  ? ^vA 

time  units, 

I 

The  situation  we  have  described  so  far  is  pictured  in  figure  1,2,1, 

In  figure  1,2,1  we  have  drawn  the  edge  e.  .  as  two  directed  edges, 
e^j  and  e.»j  to  stress  that  the  lists  A^A.  and  A-tA^  are  distinct.  The 
arrowhead  indicates  the  direction  of  transmission  along  the  directed  edgee 

Refering  to  figure  1.2.1  one  can  easily  see  how  the  subject  predicts 
the  succeeding  lettors  of  the  list  after  he  has  learned  it.  If  he  has 
learned  the  list  ...  A  .A,  A.,.,,  excitement  of  x.(t)  will  result  in  a  signal 
traveling  to  V,  .  It  will  arrive  at  v^/^ik  time  units  later  and  x^Ct)  will 
be  excited  and  a  signal  will  be  sent  to  V.  and  so  on.  For  simplicity, 
we  shall  assume  that  all  the  transmission  delays  are  equal,  or  X  •  .  ~X 
for  all  i  and  j. 

The  effect  of  learning  on  the  subject's  prediction  process  is  as 
follows : 

(e)  Before  learning,  excitement  of  the  V.  node  by  presentation  of 
the  A  letter  results  in  equal  prediction  signals  arriving  at  all  nodes 
to  which  V.  is  connected  by  edges  X   time  units  after  presentation  of 
the  A^  letter,  x 

(f )  After  learning,  excitement  of  the  V.  node  by  presentation  of 
the  letter  A  •  results  in  a  large  prediction  signal  being  delivered  to  the 
V.  .   node  from  V.  X    time  units  after  presentation  of  A.,  No  prediction 
signals,  or  at  least  small  prediction  signals,  are  delivered  to  the  other 
nodes  connected  to  V.  by  edges. 

Now,  we  must .develop  a  mechanism  which  connects  the  subject's 

prediction  process  from  state  (e)  to  state  (f)  as  the  list  is  repeatedly 

presented, 

// 


To  develop  this  mechanism,  we  note  that  the  experimenter  is  present- 
ing letters  to  the  subject  every  w  time  units.  If  w  is  too  small,  say 
i  millisecond s  the  subject  will  be  unable  to  distinguish  the  separate 
letters  of  the  list  and  it  will  be  impossible  for  him  to  learn  the  list. 
On  the  other  hand,  if  w  is  too  large,  say  24  hours,  we  expect  the  subject 
to  have  lost  the  context  of  the  experiment.  That  is,  if  the  experimenter 
said  "A"  yesterday,  and  then  says  "CM  today,  we  would  not  be  surprised 
if  the  subject  responded,  "See  T-jhat?".,0  Again,  we  do  not  expect  the  subject 
to  leam  the  list  when  w  is  too  large.  In  between  these  extremes  we 
expect  the  subject  to  do  very  well. 

We  now  analyze  this  dependence  of  the  subject's  learning  ability 
on  the  presentation  interval  w. 

If  w  is  large,  say  w  >>  X    ,  then  the  process  x.(t)  has  long  ago 
decayed  to  zero  before  the  next  letter  is  presented  to  the  subject  and 
xv1  becomes  large.  Additionally,  the  prediction  signals  from  V..  have 
long  since  traveled  to  the  ends  of  the  edges  from  V.,  performed  their 
prediction  excitement  of  the  other  nodes,  and  decayed.  As  w  is  shortened 
we  begin  to  arrive  at  the  situation  where  the  prediction  signal  from  V^ 
arriving  at  the  other  nodes  is  still  largo  when  V.+.,  is  excited  by 
presentation  of  the  A  .^  letter.  When  w  =  X    the  signal  from  Vj  arriving 
at  the  other  nodes  exactly  correlates  with  the  Xj+i  process.  Making 
w  smaller  yet,  such  that  w  «  r  ,  moans  that  many  nodes  are  large  when 
the  prediction  signals  from  any  one  of  the  excited  nodes  arrives  at  any 
other. 

It  seems  likely  that  the  subject's  learning  ability  is  dependent 

on  the  correlation  between  his  prediction  signal  arriving  at  the  V... 

node  from  the  V.  node  and  excitement  of  the  x«+-j  process  by  presentation 
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of  the  A...^  letter.  Assuming  that  this  is  the  key  to  the  subject's 
learning  i  bility,  we  may  write  down  some  properties  for  his  learning 
mechanism  J 

(g)  If  in  one  presentation  of  the  list  .  ..A-A^e , ,  the  prediction 
signal  arriving  at  the  V.4.*  node  from  the  V.  node  is  large  at  the  same 
time  that  the  X-^h  process  is  large  ,  then  on  subsequent  predictions  of 
the  list  a  largo  prediction  signal  is  delivered  to  V*^.  from  the  V.  node, 

(h)  If  condition  (e)  is  not  met,  then  on  subsequent  predictions, 
a  small  prediction  signal  is  delivered  to  the  V..m  node. 

In  condition  (g)  and  (h)  we  have  gotten  in  some  geometrical  difficulty. 
Previously  we  had  decided  that  the  prediction  signal  traveling  along  an 
edge  e  .  is  the  x.  process  from  the  V.  node  suitably  time  delayed  to 
account  for  the  finite  transmission  velocity.  If  this  signal  is  allowed 
to  arrive  at  the  V.  node  unchanged  it  will  always  be  large  X    time  units 
after  excitement  of  V. c  Yet  in  condition  (g)  and  (h)  we  have  described 
a  process  which  determines  the  amplitude  of  the  prediction  signal  being 
delivered  to  V  -  based  on  the  past  correlations  between  the  prediction 
signal  and  the  x^  process.  The  difficulty  is  that  we  must  now  require 
V.  to  perform  two  functions:  That  of  keeping  track  of  recent  presen- 
tations to,  or  predictions  by,  the  subject  of  the  A.  letter  via  the  x. 

J  J 

process;  and  that  of  determining  how  vigorously  the  subject  should  pre- 
dict the  A .  letter  based  on  past  experience, •  The  second  of  these 
functions  was  placed  at  V.  because  it  requires  both  the  prediction 
signal  x^(t  -  X    )  and  the  x.  process  be  simultaneously  available  for 
correlation. 

Reference  to  figure  i,2cl  shows  that  besides  V.t,  the  other  place 

where  x^(t  -  ?   )  and  x^  are  simultaneously  available  for  correlation  is 
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the  arrowhead  of  the  e. .  directed  edge.  In  order  to  maintain  one  function 

J.  J 

per  element  of  figure  l,2,le,  vie  shall  locate  a  process,  z. .,  in  the 

j-J 

arrowheads  of  the  directed  edges  with  properties  (g)  and  (h).  This 

simplifies  thing;-;  considerably,,  because  we  can  make  this  process  an 

amplifier  of  prediction  signals  with  the  further  properties: 

(i)  When  z.  .  is  large,  a  large  prediction  signal  is  delivered  to  V. 
J  J 

from  Vi0 

(j)  When  z.  .   is  small,  a  small  prediction  signal  is  delivered  to  V. 

from  V.. 
a 

A  modification  of  eqn  (i)  is  now  in  order  to  account  for  conditions 
(i)  and  (j)  above: 

eqn  (2)     Xj(t)  =  -  oucj(t)  +  Pj(t)  +  %  t±fc±(%   -  r  ) 

Considering  conditions  (e),  (f),  (g),  and  (h)  we  may  formulate  an 
equation  for  z^ ^  as  a  function  of  time* 

Condition  (e)  implies  that  befor©  the  experiment  begins,  z.,  .(t)^-  04 
That  is,  the  initial  conditions  on  the  z,  .  are  J 

Zij(O)  ~0 

Conditions  (f )  and  (g)  imply  that  z^.(t)  gets  large  only  when  the 
predicting  signal  x.(t  -  T  )  and  process  x.(t)  are  large  at  the  same  time, 
and  that  z.  .(t)  remains  large  for  a  long  time  afterward*  That  is: 

z.  .(t)  ^x.(t  -r  )x.(t) 

Condition  (h)  implies  that  when  x^(t  -  X  )  and  x,.(t)  are  not  large 
at  the  same  time,  then  z.  .(t)  decays  toward  zero.  That  is: 

Z-.CO'nx  -uz^Ct) 

Combining  the  abovo  results,  we  have: 

eqn  (3)     z.  .(t)  *=  ~uz,  ,(t)  +  x,(t  -  r  )x.(t) 
with  initial  conditions  z*A0)~0, 


Equations  (2)  and  (3)  are  sufficient  to  describe  the  subject's 
learning  process  and  its  dependence  on  the  experimenter's  presentation 
interval  w.  If  the  experimenter  presents  the  letters  of  the  list  with 
a  time  interval  between  each  letter  of  approximately    time  units,  then 
when  the  A  .+*  letter  is  presented,  the  prediction  signal  from  the  V. 
node  has  arrived  at  the  arrowhead  of  the  e.  ....  node  and  the  product 
x.^(t)x.(t  -  X  )  is  large.  From  cqn  (3)  z^  j+j/t)  grows.  On  subse- 
quent repetitions  of  the  list  the  same  conditions  are  met  and  z..  ..^(t) 
grows  larger  yet.  On  the  other  hand,  x,(t)  for  the  nodes  V,  ,  k  $   j  +  i, 
corresponding  to  letters  of  the  alphabot  other  than  A.,.,  are  small  when 
A.p.  is  presented  and  from  eqn  (3)»  z.  (t)  decays  toward  zero  for  k  '?-   j+1. 
When  the  subject  is  asked  to  recall  the  list,  he  uses  his  prediction 
process,  starting  at  the  first  letter  A.,   and  sequentially  excites  each 
of  the  nodos  corresponding  to  letters  in  the  list  in  their  correct 
order  by  following  the  path  of  largo  z. .'s  until  the  end  of  the  list 
is  reached.  To  prevent  saddling  ourselves  with  a  cumbersome  output 
mechanism,  we  assume  that  the  experimenter  can  read  the  amplitudes  of 

the  x.(t)  processes  and  considers  a  large  x.(t)  as  a  response  by  the  subject. 
J  3 

One  can  easily  see  that  when  w  >  >  x    ,  none  of  the  products  . 

x^(t  -  X  )x^(t)  are  large  and  the  subject  learns  nothing.  On  the  other 

hand  when  w  <:<  •£•  ,  many  nodes,  V,  ,  are  excited  before  the  prediction 

signals  from  the  node  associated  with  the  first  letter  in  the  list 

arrive  at  their  corresponding  arrowheads.  Thus  the  associated  z^^(t)*s 

grow  large.  This  situation  continues  as  the  prediction  signals  from 

the  subsequent  letters  of  the  list  arrive  at  their  arrowheads.  Called 

upon  to  repeat  the  list,  the  subject's  prediction  process  will  equally 

excite  many  nodes  at  the  same  time.  To  the  subject,  it  will  appear 
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that  every  letter  of  this  list  succeeds  every  other  letter.  Although 
he  has  limited  his  guesses  to  the  letters  in  the  list,  the  subject  is 
no  better  off  than  he  x-jas  at  the  beginning  of  the  experiment  in  being 
able  to  repeat  the  list* 
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section  1.3     Generalized  Embedding  Fields 

The  embedding  field  network  derived  in  section  1,2  to  model  learn- 
ing of  an  alphabetic  list  is  a  specialised  example  of  embedding  field 
networks.  This  particular  network  was  derived  because  it  illustrates 
vividly  the  major  ideas  behind  embedding  field  theory  and  its  derivation 
depends  only  upon  intuitive  ideas  about  learning.  It  is  not  the  only 
embedding  field  network  which  can  learn  time  sequential  lists  and  it 
may  not  be  the  best  network  for  this  purpose*  The  alert  reader  may 
have  noticed  that  it  can  not  repeat  a  list  in  which  a  letter  is  repeated, 
In  addition  to  being  dependent  on  the  experimenter's  presentation  inter- 
val w,  its  performance  is  highly  dependent  on  the  time  delay  X   and  the 
parameters  <*.  and  u  in  eqns  (2)  and  (3)»  It  has  other  problems,  but 
remarkably s  Grossberg  lias  shown  that  these  problems  are  qualitatively 
similar  to  problems  experienced  by  human  subjects  trying  to  learn  an 
alphabetic  list.  (The  interested  reader  is  refered  to  references  1 
and  3  for  a  detailed  analysis  of  networks  similar  to  that  derived  in 
section  1.2, ) 

However j  the  network  of  section  1*2  contains  most  of  the  elements 
of  embedding  field  theory  and  we  shall  pause  here  to  list  them.  Figure 
l,3tl  shows  the  pictoral  representation  of  these  elements. 

(1)  A  node  V..  representing  an  elemental  event  which  the  network 
is  capable  of  recognizing  and  responding  to, 

(2)  A  directed  edge  e. .  allowing  tranxmission  of  signals  at  a 
finite  velocity  in  one  direction  from  node  V.  to  node  V.,  Pictorally, 

a  directed  edge  is  drawn  as  an  arrow  shaft  with  the  arrowhead  indicating 
the  transmission  direction. 
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eij  Nij 


V.  o-       — '^—. to  Vj 


Figure.  1.3.1c   Elements  of  an  embedding  field  network. 

The  process  x  (t)  is  located  at  the  V.  node. 
i  i 

The  process  x.(t)  is  located  at  the  V.  node. 
J  J 

The  process  z      (t)  is  located  at  the  arrow  head  N 

The  prediction  signal  x.(t  -T)  is  arriving  at  the  arrow  head  N^. 


/e 


(3)  Arrowheads  N  .  representing  the  termination  of  directed 
edge  e.  .  on  the  node  V..  Because  the  directed  edges  transmit  signals 
without  effecting  them,  it  will  not  be  necessary  to  reference  signals 
traveling  along  a  directed- edge  until  they  reach  the  arrowhead.  In 
all  subsequent  equations  in  thir.,  paper,  signals  which  have  been  trans- 
mitted along  a  directed  edge  will  be  identified  by  the  effect  of  the 
transmission  delay  on  them,  i.e.  x.-L(t  -  T  )• 

(h-)     Input  pulses  P.(t)  to  node  V.  -indicating  the  occurance  of  the 

elemental  event  represented  by  V.  in  the  environment  external  to  the 

networks  Input  pulses  will  always  be  non  negative  and  identically 

zero  except  in  a  small  time  interval  around  the  occurance  of  event  i„ 

It  is  assumed  throughout  this  paper  that  P.(t)  is  immediately  available 

at  V.  whenever  event  i  occurs.  Because  embedding  field  theory  does  not 

deal  with  the  input  apparatus  necessary  to  deliver  inputs  to  nodes, 

no  geometric  symbol  has  been  developed  for  this  purpose, 

(5)  A  process  x.(t)  located  at  V.  with  the  general  formulation: 
J  j 

1.3.1     x.(t)  =  -a(t)  +  2  b±(t  -  t  )  +  P,(t) 

The  amplitude  of  x.(t)  indicates  whether  the  event  represented  by 

V.  has  recently  been  observed  or  predicted  by  the  network* 

The  term  a(t)  is  designed  such  that  x.(t)  always  returns  to  some 

ambient  state  indicative  of  no  recent  occurance  or  prediction  of  event 

J' 

The  term    ^(t  ~  T  )  is  the  effect  of  prediction  signals  on  V ., 

The  summation  is  taken  over  every  arrowhead  N  .  impinging  on  V., 

b  (t  -  T  )  is  the  modified  prediction  signal  received  by  V.  from  the 

arx-owheads  N   impinging  on  it. 

/? 


Wg  vrill  most  frequently  use  the  f olloi-Ting  formulations  for  these 
functions : 

a(t)  =   (XX  (t) 

bi(t  -r  )  ■  pz   (t)x±(t  -r  ) 

With  these  formulations,   equation  i.3»i  is: 

x..(t)  -  -ax^(t)  +  /3?  z     (tjx^t  -r  )  +  Pj(t) 

(6)  A  prediction  signal  modification  process  z.  .(t)  located  in 

the  N.  .  arrowhead  with  the  general  formulation: 

1.3.2     z  (t)  =  -u(t)  +  fCx^t  -  T  ),  Xj(t)) 

The  z.  .(t)'s  are  the  memory  of  tho  network.  In  general  z.  .(t)  will 
ij  ""-O 

correlate  prediction  signals  signals  x.(t  -  T  )  with  the  process  x^(t) 
via  function  f ,  and  deliver  a  suitably  modified  prediction  signal 
b.(t  -  7f   )  to  V.,  The  amplitude  of  z.  .(t)  is  the  network's  memory  of 
how  well  x.(t  -  T  )  and  x.(t)  have  correlated  in  the  paste  The  term 
u(t)  is  the  network's  "forgetfulness".  We  will  most  frequently  use 
the  following  formulations  for  these  functions: 

u(t)  -  -uz,  .(t) 

f(xi(t  -r  ),  x.(t))  =  vx,(t  -  r  )x.(t) 

With  these  formulations,  equation  i.3«2  is: 

z.  .(t)  =  -uz.  .(t)  +  vx.(t)  x.(t  -•  r  ) 
iJ        aj       a     3 

Combining  the  geometric  elements  of  figure  1.3*1  in  various  ways 

and  suitably  defining  the  terms  of  eqns  1.3»1  and  1,3»2,  Grossberg 

has  developed  networks  which  qualitatively  model  many  general  categories 

of  learning  phenomena,,  In  addition  to  describing  learning  phenomena 

on  the  psychological  level  as  in  section  1.2,  Grossberg  has  drawn  an 

analogy  between  embedding  field  networks  and  nerve  networks  in  living 

organisms  which  is  a  concrete  theoretical  proposal  for  the  neurophys- 
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iological  phenomena  underlying  learning  in  living  organisms 6  (See 
references  2  and  4, ) 

The  power  of  embedding  fiold  theory  is  that  it  is  a  generalized 
theory  describing  learning  with  deterministic  equations,  Tho  equations 
are  simple  enough  to  allow  mathematical  analyses  and  the  establishment 
of  tho  conditions  necessary  for  thsm  to  perform  the  tasks  desired  of 
theme  Due  to  the  large  number  of  nodes  and  arrowheads  necessary  to 
modol  a  particular  learning  phenomena,  exact  analytic  descriptions 
of  their  performance  are  difficult.  However,  the  basic  simplicity  of 
the  equations  makes  the  simulation  of  their  performance  straight- 
forward on  a  high  speed  computer. 


CHATTER  2     THE  OUTSTAR  AND  THE  OUTSTAR  AVALANCHE  EMBEDDING  FIELD 

NETWORKS 
section  2d     Description  of  the  Networks 

The  embedding  field  network  of  section  1.2  was  derived  to  illustrate 
the  concepts  of  embedding  field  theory.  Combining  the  elements  of  his 
theory  in  another  way,  Grossberg  has  proposed  two  very  interesting  net- 
works which  this  paper  will  study.  The  out  star  network,  and  a  combin- 
ation of  out stars  called  an  outstar  avalanche f   are  networks  capable 
of  learning  and  reproducing  any  number  of  complicated  space-titae 
patterns. 

Figure  2.1.1  presents  the  geometric  schematic  for  an  outstar  and 
the  basic  equations  governing  its  performance.  The  N  grid  nodes  V .  ,  V?t 
...V  represent  the  set  of  elemental  events  the  network  is  capable  of 
recognizing.  Each  of  the  distinct  combinations  of  elemental  events 
taken  singly  or  several  at  a  time  is  a  distinct  pattern* 

The  command  node  V  represents  an  event  which  always  precedes  a 
particular  pattern  of  grid  elemental  events.  The  function  of  the  outstar 
is  to  learn  to  associate  the  oecur-ance  of  the  event  associated  with  the 
command  node  causally  with  the  occurance  of  the  grid  pattern.  After 
learning  this  "causal1'  association,  the  occurance  of  the  command  node 
event  will  result  in  the  associated  pattern  occuring  on  the  grid  -  even 
though  thero  are  no  external  inputs  to  the  grid. 

As  an  illustration,  the  outstar  may  be  used  to  model  a  pianist 

playing  a  piano  from  a  score.  Excitement  of  the  x  process  at  the 

c 

command  node  represents  the  event  of  reading  the  notes  associated  with 
a  chord  on  his  score.  The  grid  nodes  represent  his  fingers  and  a  large 
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V. 


COMMAND  NODE 


v, 


v 


} 


GRID  NODES 


n 


EQUATIONS  GOVERNING  NETWORK  PERFORMANCE 


2.1.1       xr(t)  =  -ax  (t)  +  P„(t) 
w  c  c 


2.1.2       x.-(t)  =  -ax.(t)   +    /3z    .(t)x  (t-r)  +  P,(t) 


ciN    '    c 


2.1.3       z     (t)  -  -uz    .(t)   +  vx  (t  -r)x.(t) 
ci  ci  c  i 


Figure  2.1.1,   An  outstar  network  and  the  equations 
governing  its  performance. 
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STARTING  NODE 
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COMMAND  NODE  CASCADE 


0  0  • 

v,  v2 
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GRID  NODES 
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EQUATIONS  GOVERNING  NETWORK  PERFORMANCE 


2.1.4    x  -.(O  =  -ax  ,(t)  +  P  At) 

C-L  Cl  cl 


2.1.5    x^.(t)  =  -ax  .(t)  +  Ax  .  , 

I         Cl-1 


Cl 


Cl 


(t  -  X)  for  Kif  M 


t-i 


2.1.6     x,,(t)     =  -axXt)   +       (3.2  zci  j(t)xc.(t  -T)   +  P.(t) 

for  16. if  N 


2.1.7     z    .      (t)  =  -     uz         (t)  +  vx   .(t  -*)x  (t) 
ci,J  ci,j  ci  j 


Figure  2.1.2.      An  outstar  avalanche  and  the  equations 

governing  its  performance. 
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x.(t)  is  interpreted  as  the  ith  finger  being  lowered  to  strike  a  piano 
key.  A  small  x.(t)  represents  the  jth  finger  being  raised  so  as  not 
to  strike  a  key*  By  practice  the  piano  playor  will  learn  the  proper 
finger  positions  associated  with  the  written  chord  iia  the  riusical 
score.  The  out star  will  learn  the  proper  finger  positions  by  reading 
the  chord  on  the  score  and  having  its  fingers  placed  in  the  proper 
positions  sufficiently  often.  This  finger  pattern  will  be  remembered  by 
large  and  small  z   . (t)'s  at  the  appropriate  arrowheads  impinging  on 
the  grid  nodes.  After  having  learned  the  association  between  the  written 
chord  on  the  score,  both  the  pianist's  and  the  outstar's  fingers  will 
automatically  assume  the  proper  position  when  the  chord  is  read. 

Figure  2ei*2  presents  the  geometric  schematic  of  an  out star  ava- 
lanche and  the  basic  equations  governing  its  behavior.  An  out star 
avalanche  is  a  cascaded  series  of  out stars e  Each  out star  learns  and 
is  capable  of  reproducing  the  pattern  on  the  grid  approximately 
time  units  after  its  command  node  is  excited.  The  command  nodes  are 
deterministically  cascaded,,  That  is,  excitation  of  the  starting  node 
V   by  an  input  will  always  result  in  a  prediction  signal  going  to  VC2 
which  will  send  one  to  V   and  so  on.  There  is  no  learning  associated 
with  this.  The  command  node  cascade  is  an  embedding  field  clocko 
Because  the  prediction  signals  travel  along  directed  edges  at  constant 
velocities,  excitement  of  the  starting  node  results  in  a  prediction 
signal  arriving  at  command  node  V  . ,  (i  -  1)T   time  units  later.  If 
a  time  varying  pattern  of  elemental  events  is  being  played  on  the  grid, 
then  each  command  node  takes  a  picture  of  that  pattern  when  it  is 
excitedo  Thus  associating  the  start  of  a  particular  time  varying 

pattern,  say  a  piano  sonata,  with  excitement  of  the  starting  node  will 
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result  in  a  time  sequential  series  of  pictures  approximating  that 
pattern  being  learned  by  the  network.  If  many  command  nodes  are 
cascaded  in  this  manner  and  T  is  made  sufficiently  small,  the  sampled 
data  approximation  of  the  pattern  can  be  made  arbitrarily  close  to 
the  pattern o 
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section  2.2    Theoretic?.!  Work  on  Out  stars  and  Out  star  Avalanches 

Grossberg  has  mathematically  analyzed  tho  pattern  learning  abilities 
of  outstars  and  outstar  avalanches  extensively,,  (Refs  7  and  8).  In 
the  process  of  this  analysis  ho  developed  particularly  handy  mathema- 
tical descriptions  of  a  pattern  of  elemental  events,  the  pattern  learned 
by  the  outstar  to  approximate  this  pattern,  and  the  pattern  reproduced 
by  the  outstar  on  its  grid  when  predicting,  the  elemental  event  pattern* 

An  elemental  event  pattern  is  defined  by  the  values  of  the  input 

pulses  P.(t)  at  the  grid  nodes.  Although  their  amplitudes  may  be 

different,  all  input  pulses  have  the  same  shape.  We  can  describe  the 

relation  of  the  ith  input  pulse  to  the  other  N  -  1  inputs  (consisting 

of  non  zero  pulses  P.(t)  indicating  that  event  j  is  part  of  the 

J 

pattern  and  zero  pulses  P,,(t)  indicating  that  event  k  is  not  part  of 

the  pattern)  by  forming  tho  probability  J 

P±(t) 

Qa   -  -n ~  when  any  P.(t)  comprising  the  pattern 

2  Pj(t)  is  non  zero 

The  elemental  pattern  can  be  completely  described  by  the  N  dimensional 

vector, 

Note  that  this  description  of  the  pattern  is  amplitude  indepen- 
dent.  That  is,  6  defines  the  pattern  whether  that  pattern  is  pre- 
sented vigorously  or  not.  Additionally  note  that  by  tho  def inition  of 

-a 

the  0-c  ,  0  not  only  describes  a  pattern  by  the  occurance  or  non 

occurance  of  elemental  events  in  it,  but  also  by  the  relative  strength 
of  the  occurance  of  those  events.  In  the  piano  playing  example,  this 
corresponds  to  describing  tho  finger  positions  for  a  chord  by  indicating 
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which  fingers  are  raised  so  as  not  to  strike  keys  and  which  fingers 

are  lowered  to  strike  keys,  plus  the  relative  pressure  each  of  the  lowered 

fingers  is  to  exert  on  the  keys. 

Since  the  P  (t)'s  have  the  same  shape,  and  differ  only  in  amplitude, 
the  0j's  are  constants  during  presentation  of  the  pattern. 

In  a  sinilar  manner  the  outstars'  response  to  presentation  or 
prediction  of  this  pattern  can  be  described  by  the  probability  vector: 

X(t)  =  x1(t),  x2(t) xn(t) 

Xj(t) 

where     X.(t)  =  — ^j — 

£  x.(t) 

The  pattern  learned  by  the  outstar  to  approximate  this  pattern  can  be 

described  by  the  probability  vectors 

Y(t)  =y<(t),  y_(t),   ...,  y  (t) 
™  Ct  n 

z     (t) 

where  y.(t)  =  — 


&"«(t)      . 

Now  suppose  that  the  pattern    6     has  been  presented  to  the  outstar 
M  times.     Then  Grossberg  has  proved  that  starting  with  arbitrary  initial 

data  for  the  x.(t)*s  and  z   . (t)*s: 

a.  ca 

(a)  For  every  M  =  1,  the  limits! 

(n)  lim    (aa) 

Q  =:  t-><*>  X.  (t) 
a        a 

and 

R.  ta  t-»oo  y.  (t) 
a        a 

exist, 

(b)  For  every  M  =  1  and  for  all  times  t  after  the  last  presenta- 
tion of  the  pattern,  the  probabilities  X.  (t)  and  y.  (t)  are  monotonic 

in  opposite  senses  with  |y  (t)  -  X-  (t)|  non  increasing  and  are  constant 

i       J- 

on  intervals  where  the  prediction  signal  from  V  is  zero. 

c 
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(c)  lira         (n)  lim       (a\) 
M->oo    m           =    M->co  M.     =  0 

i  i  i 

where 

m     =  minimum  of  X.      (tQ)  or  y.      (t   ) 

and  t/j  is  the  instant  the  last  presentation  of  the  pattern  was 

completed. 

And 

M     =  maximum  of  X.      (tn)  or  y.      (t,J 
i  i        v  vi        o 

Thus  by  (a)  -(c), 

lira     lira  (tA\  lira         lim  .^ 

M->co  t^co      X.      (t)  =  K-^co      t-^oo    y       (t)  ~  e. 

(d)  The  functions  y,(t),  ff   -  y{*}  (t)  -  X^}(t)  and  gCf  -  X^t)-^ 

j.     j.     j-        J-         1    -1-     j- 

change  sign  at  most  once  and  not  at  all  if  f .   (t=0)g.  '(t=0)-  0, 

Moreover,  f|M>  (t=0)g(M)  (t=0)>  0  implies  fM)(t)g(.M)(t)>  0  for  all  t  ^  0. 
l       i  11 

Interpreting  these  results,  we  see  that  (c)  implies  that  the 
network's  memory  of  the  pattern  and  its  predictions  of  tho  pattorn  __ 
converge  to  the  pattern  as  the  number  of  times  the  pattern  is 
presented  increases,  or  "practice  makes  perfect",  (a)  and  (b)  insures 
that  the  network's  memory  of  and  prediction  of  the  pattern  after  the 
last  presentation  of  the  pattern  will  get  no  worse  than  it  was  immedi- 
ately after  that  last  presentation,  (d)  shows  that  there  is  at  most 
one  oscilation  in  the  convergence  and  therefore  the  network's  learning 
ability  is  stable. 

An  additional  benefit  of  result  (c)  is  that  if  the  network  started 

associating  one  pattern  with  the  command  node  event  and  it  is  decided 

that  association  is  an  error,  then  a  new  pattern,  the  correct  one, 

may  be  learned  over  the  old  one  with  sufficient  practice.  That  is  , 

all  errors  are  correctable. 
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section  2.3    Approach  to  the  Study 

Grossberg's  theoretical  results  greatly  enhance  the  attractiveness 
of  out stars  and  avalanches  as  devices  for  modeling  certain  categories 
of  learning  phenomena.  As  qualitative  models  they  have  wide  application, 
(See  refs  6  and  9)  However,  beyond  the  qualitative  insight  that  they 
provide,  are  they  practical?  The  mathematics  guiarantee  that  an  avalanche 
will  learn  a  piano  sonata  with  sufficient  practice.  If  sufficient 
practice  means  forty  years,  we  would  do  well  to  go  shopping  for  another 
model  -  not  vecause  they  do  not  work,  but  because  they  do  not  work  well 
enough e 

Thus  the  question  "How  well  do  they  work?"  is  pertinent.  This 
is  the  question  that  this  paper  addresses.  It  is  a  practical  question 
and  out stars  and  avalanches  are  considered  as  practical  devices  that 
learn  throughout  the  rest  of  this  paper. 

In  order  to  accomplish  this  study ,  a  digital  simulation  of  the 
networks  was  programed  onto  a  computer.  The  details  of  this  simulation 
and  an  evaluation  of  its  accuracy  are  provided  in  appendix  A,  All 
attempts  were  made  to  reduce  the  artificialities  and  errors  introduced 
by  this  method  of  study.  However,  constraints  were  forced  on  the  study 
by  the  digital  simulation  and  these  constraints  will  be  noted  and  ex- 
plained as  they  occur  in  this  paper. 

As  an  outstar  avalanche  is  a  cascade  of  out stars,  the  primary 
emphasis  of  this  study  is  on  out stars.  In  studying  the  out stars, 
attention  is  devoted  to  the  possible  interactions  of  one  outstar  in  an 
avalanche  with  another.  Where  avalanches  are  presented,  they  are  more 
or  less  used  as  tests  to  confirm  the  conclusions  established  while 
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studying  the  out stars  composing  thea, 


CHATTER  3     THE  SIMPLE  OUTSTAR 

section  3»i     Specification  of  Parameters  for  the  Study 

The  geometric  schematic  and  equations  in  figure  2,1,1  describe 
the  simplest  outstar.  The  equations  are  repeated  here  for  easy 
reference: 

3.1.1  x  (t)  =  -<*x  (t)  +  P„(t) 

3.1.2  x^t)  =  -ooc^t)  +  P±(t)  +(3zci(t)xc(t  -  X   ) 

3.1.3  zci(t)  =  -nzci(t)  +  vxi(t)xc(t  -r  ) 

In  order  to  study  this  outstar,  we  must  assign  numbers  to  the  constants 
ttf/3  %   u,  v,  and  X   j  initial  conditions  must  be  assigned  to  the  variables 
xc,  x.t ,  and  z.  j  a  shape  and  amplitude  for  the  inputs  P  and  P.  must  be 
selected;  and  the  numbers  of  pattern  nodes,  N,  must  be  specified. 
Additionally,  the  test  pattern  to  be  taught  to  the  outstar  must  be 
decided  upono 

A  great  deal  of  experimental  time  can  be  saved  if  these  parameters 
are  specified  in  a  somewhat  rational  way.  A  rationale  can  be  developed 
for  any  method  of  specifying  the  parameters,  so  we  shall  arbitrarily 
begin  with  the  inputs. 

Firstly,  the  inputs  are  only  used  to  indicate  the  occurance  of 

elemental  events  external  to  the  outstar.  All  we  require  of  them  is 

that  they  be  non  negative  in  an  interval  around  the  occurance  of  the 

elemental  event  and  zero  at  all  other  times.  Also,  we  would  like  them 

to  reflect  the  strength  of  presentation  of  the  events  they  represent. 

For  a  first  try  we  will  make  them  identical  in  shape,  duration  and 

amplitude  for  both  the  grid  inputs  P. (t)  and  the  command  input  P  (t). 

J-  c 
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An  impulse  might  be  good  shape  for  them,  but  there  might  be  effects 

associated  with  duration  that  would  be  interesting  to  see.  On  the 

other  hand,  if  we  want  to  analytically  check  our  results,  then  we  want 

the  inputs'  shape  to  be  simple  enough  to  make  the  analysis  tractable. 

A  rectangular  pulse  of  amplitude  A  and  duration  S  is  suitable.  Note 

that  with  this  selection  for  inputs  we  have  implied  that  our  input 

apparatus  is  a  digital  sampling  device  which  samples  the  continuous 

variation  of  events  in  the  external  environment  at  time  tft,  sets  the 

inputs  to  nodes  corresponding  to  events  present  in  the  environment 

at  t  to  value  A,  and  holds  these  values  until  the  next  sample  is  taken 
0 

at  time  t~  +  8   •  If  we  recall  that  an  avalanche  performs  a  similar 

digital  approximation  to  time  varying  events,  this  selection  for  inputs 

is  not  too  bad. 

As  the  direct  response  to  the  inputs  is  linear,  we  may  leave  the 

amplitude,  A,  of  tho  input  pulses  arbitrary.  In  selection  of  the  duration 

£  ,  we  run  into  a  compromise  with  the  digital  simulation.  An  accurate 

simulation  of  the  response  to  a  long  duration  pulse  requires  considerable 

computation  time.  Thus  to  minimize  computation  time,   should  be  short. 

Yet  the  pulses  were  given  a  finite  duration  to  study  possible  effects 

of  duration.  We  do  not  want  S   to  be  too  short.  With  this  trade  off  in 

mind,  a  good  selection  for  8    would  be  the  shortest  rise  time  in  the 

outstar.  The  rise  times  of  the  out  star  are  l/o.  for  tho  x  processes 

at  the  nodes,  and  l/u  for  the  z   processes  at  the  arrowheads,  u  is  the 

"forgetting  rate"  of  the  outs bar  and  it  would  be  expected  that  the 

forgetting  rate  of  the  out star  should  be  slower  than  the  response 

rate,  a   ,  of  the  x  processes.  Therefore  it  is  reasonable  that  «  should 

be  greater  than  u*  This  implies  that  l/<X  is  tho  shortest  rise  time 
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in  the  otitstar  and  we  shall  set  5  -•  l/o(.  , 

The  x  processes  at  the  command  node  and  the  grid  nodes  indicate 

the  recent  presentation  to  or  prediction  by  the  cat star  of  events.  At 

the  beginning  of  a  learning  experiment  it  is  reasonable  to  assume  that 

there  has  been  no  recent  presentations  or  predictions  of  the  events  to 

be  learned.  The  initial  conditions  for  the  x  processes  can  be  assumed 

zero,  i.e,  x_,(0)  =  0  =  x.(0)  for  all  i, 
**         a 

The  response  time  <x  of  the  x  processes  has  already  been  specified 
as  OC  -  1/5   ,  Thus  all  the  parameters  for  the  command  nodes  x 
process  have  been  spedified.  For  the  grid  nodes  T  ,  /3  ,  and  the  initial 
conditions  on  the  z's  still  must  be  specified.  To  save  computation 
time,  X   should  be  small ,  As  there  is  no  feedback  from  the  grid  to  the 
command  node,  there  is  no  necessity  for  T  to  be  non  zero  in  this  simple 
out star.  In  a  digital  simulation,  however,  the  accuracy  is  improved 
if  there  is  a  time  delay  between  simultaneous  processes  and  making 
"£  >    0  is  advantageous,  A  suitable  selection  for  T  is  X    =  o     , 

From  equation  3el«2  it  can  be  seen  that  /3  and  Z  (t)  determine 

the  amplitude  of  the  prediction  signal  being  admitted  to  grid  node 

V.,  As  the  out  star's  memory  is  the  z_..(t)'s,  it  is  the  most  important 
i  ci 

factor  in  this  prediction  signal  amplitude  determination.  Setting 
/3  =  1  will  make  analyzing  the  effect  of  the  z's  on  the  prediction 
signals  easier. 

The  parameters  associated  with  the  z  processes,  u,  v,  and  initial 
conditions  z  .(0),  must  be  specified,  u  is  the  "forgetting  rate"  of 
the  out star.  As  we  want  the  out star  to  remember  what  it  has  learned^ 
we  want  u  to* be  small.  Remembering  that  computation  time  is  scarce, 

a  small  u  for  this  experiment  is  anything  such  that  the  decay  tame 
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l/u  of  the  z  processes  is  several  tines  longer  than  the  length  of  the 
experiment. 

Selecting  v  is  a  problem.  As  can  be  seen  from  equation  3«1»3» 
v  determines  the  rise  rate  and  amplitude  of  the  z  process  given  an 
x.(t)  response  and  the  prediction  signal  x  (t  -Z   ),  In  presenting 
a  pattern  to  the  outs  tar  to  be  learned,  tho  best  learning  should  occur 
when  the  inputs  to  the  grid  nodes  are  presented  at  the  same  time  as 
the  prediction  signals  from  the  command  node  arrives  at  the  arrowheads, 
The  problem  is  that  in  this  situation,  how  well  should  the  outstar  learn 
the  pattern  on  the  first  presentation?  To  answer  this  question,  we 
need  some  way  of  measuring  how  well  the  outstar  has  learned  a  pattern 
after  presentation. 

A  tentative  operational  measurement  would  be  to  say  that  the  outstar 
has  learned  a  pattern  well  when  the  prediction  process  drives  the  amp- 
litudes of  the  grid  node  x  processes  to  at  least  the  same  values  as 
they  are  driven  to  by  the  event  inputs.  Using  this  measurement  we 
can  specify  v's  which  result  in  well  learning  in  one  presentation  or 
two  presentations  and  so  on. 

However,  this  does  not  end  the  problem  associated  with  "rationally" 
selecting  an  initial  v  for  an  experiment.  Suppose  we  specify  a  v 
which  results  in  well  learning  in  one  presentation.  What  value  should 
this  v  have?  A  rational  selection  of  an  initial  v  requires  solving 
the  outstar  equations.  The  reason  why  the  outstar  is  being  simulated 
is  the  difficulty  of  analytically  solving  these  equations.  To  avoid 
these  difficulties,  the  procedure  taken  in  this  study  was  to  specify 
all  other  parameters  in  the  outstar  including  the  numbers  of  presenta- 
tions required  for  well  learning.  A  guess  is  then  made  for  a  v  and  an 
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experiment  is  performed  to  soe  what  amplitude  the  prediction  process  will 
drive  the  grid  nodes  to  after  one  pattern  presentat:  ">ne  The  guessed 
v  is  then  appropriately  scaled  to  result  in  the  specified  well  learning 
criteria. 

For  the  current  experiment,  v  was  selected  to  result  in  well 
learning  in  txvo  pattern  presentations,  • 

Concerning  the  initial  conditions  for  the  z  processes,  we  expect 
on  the  first  presentation  of  the  pattern"  that  the  network  has  not 
previously  learned  anything  about  the  pattern.  That  is  z  .(0)  =  0 
for  all  i,  However,  we  wo  uld  like  to  see  what  happens  if  one  of  the 
z  .'s  is  not  zero  at  the  beginning  of  the  experiment.  Therefore  we 
will  make  one  of  the  z   .  (0)  non  zero,  but  small. 

Only  the  number,  N,  of  grid  nodes  and  the  test  pattern  to  be 
taught  the  outstar  remain  to  be  specified*  As  we  are  only  performing 
this  experiment  as  an  initial  look  at  an  outstar,  a  good  test  pattern 
would  be  presentations  of  one  event  which  the  outstar  should  leam 
to  associate  with  the  command  event.  An  additional  event  presented 
at  a  time  well  removed  from  arrival  of  prediction  signals  from  the 
command  node  would  be  a  good  T?ay  to  test  interference  between  outstars 
in  an  avalanche 0  As  v  was  selected  to  result  in  well  learning  in  two 
presentations,  this  test  pattern  will  be  presented  twice  and  then  a 
prediction  will  be  called  for  to  see  how  well  the  pattern  has  been 
learned. 

This  gives  us  two  grid  nodes,  A  third  grid  node  is  imcluded  to 
study  the  effects  of  the  non  zero  initial  conditioned  z  processes. 
No  inputs  will  be  given  to  this  grid  node, 
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We  need  now  to  only  assign  numbers  to  the  parameters  in  accordance 
with  the  above  specifications: 

Geometric  parameters: 

N  «  number  of  grid  nodes  =  3 

t  ~  time  delay  of  prediction  signal  =  0C3  sec. 

Input  parameters: 

Input  pulse  shape  is  rectangular 

A  =  input  pulse  amplitude  =10 

*  =  input  pulse  duration  =0,3  sec 

Input  pulses  will  be  delivered  to  the  command  node,  V  ,  at  times: 
0.1  sec,  lc9secc,  and  3«7  sec. 

No  input  pulses  will  be  delivered  to  grid  node  V. 

Input  pulses  will  be  delivered  to  grid  node  Vg   a^  times:  0,*4-  sec,, 
and  2,2  sec. 

Input  pulses  will  be  delivered  to  grid  node  V_  at  times:  1,0  sec, 
and  2,8  sec 

Network  parameters? 

<*  =  time  constant  of  x  process  =  3*3333  sec," 

P>  ~   prediction  signal  amplification  constant  =1,0 

-1 

u  =  "forgetting  rate"  =0,01  sec, 

v  =  correlation  amplification  constant  =1,6  (satisfies  well 

learning  in  two  presentations  criteria) 

Initial  conditions: 

x  (0)  =  x.(0)  =  0  for  all  i 
c      a 

z  .(0)  =  0.1 
cl 

zc2(0)  =  sc3(0)  =  0 
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Th3  above  lengthy  description  of  the  reasons  for  selection  of  the 
parameters  for  the  experiment  to  be  presented  in  the  next  section  was 
provided  as  an  illustration  of  the  decisions  that  must  be  made  when 
performing  the  experiments  in  this  study.  Except  where  noted,  in  the 
future  the  same  reasoning  will  underlie  the  selection  of  parameters 
for  experiments,, 
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section  3*2     Experiment  I  -  A  Look  at  a  Simple  Gut star 

Figure  3,2 A   shows  the  results  of  the  expsrihnent  outlined  in 
section  3*1  The  inputs  to  the  nodes  are  plotted  on  tha  same  trace  as 
tho  x  process  response  of  the  nodes. 

A  striking  feature  of  figure  3»2.i  is  that  th3  x  process  node  res- 
ponses all  have  amplitudes  of  significantly  less  than  the  amplitudes 
of  the  input  pulses.  It  can  be  seen  that  this  is  as  it  should  be 
if  we  consider  the  equation  governing  the  response  of  a  node  to  an 
input  onlys 

x.(t)  =  -dx.(t)  +  P.(t) 
i        a      i 

The  solution  of  this  equation  for  a  rectangular  input  pulse  of 
amplitude  A  and  duration   S    iss 

j(A/a)(i  -a""*  )    for  0±  \.t   & 

X±(t)   [(A/*)  (1  -  e~"S  )e"°tt    for  t  ^  6 
The  maximum  of  this  response  occurs  at  t  =  6  •  For  tho  parameters 
specified  for  this  experiment,  the  maximum  amplitude  of  an  x..(t)  response 
to  an  input  pulse  only  is* 

max  x.(t)  =  1.9 
which  is  about  20p  of  the  amplitude  of  the  input  pulses. 

The  pattern  we  intended  to  teach  to  the  out star  was  to  associate 
the  occurance  of  the  command  evont  with  event  2.  The  out star  was  in- 
structed in  this  pattern  twice  by  presenting  the  command  event  to  it 
and  then  presenting  event  2  to  it  "C  time  units  later.  This  can  be 
seen  from  the  command  input  trace  and  grid  node  V  *s  input  trace. 
After  the  instruction  was  over,  the  command  event  above  was  presented 
to  see  if  the  out star  had  learned  the  patterns  As  can  be  seen  from 
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Figure  3.2.1.   The  results  of  experiment  I  -  an  initial  look 
at  a  simple  outstar. 
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V  ' s  third  response,  the  outstar  did  predict  event  2  and  we  can  consider 
that  it  has  learned  the  pattern. 

This  experiment  was  also  designed  to  see  what  effect  a  small  non 
zero  initial  condition  on  a  z  would  have.  Thus  z  *   was  given  the  initial 
value  of  0,1  while  z  _  and  zco  were  given  zero  initial  values.  As 
can  be  seen  from  the  x<  response  reace,  the  small  non  zero  initial 
value  for  z  .(0)  had  no  perceptible  effect,  x  (t)  did  respond  to 
the  prediction  signals,  but  the  response  was  so  small  that  it  does 
not  show  on  the  scale  shosen  for  figure  3«2P1, 

We  gave  the  input  pulses  a  finite  duration  to  see  if  there  would 

be  any  effects  associated  with  this  duration.  Such  a  duration  effect 

is  the  fact  that  the  x  responses  reach  a  maximum  at  the  end  of  the  input 

pulses  and  then  decay  exponentially  away  from  this  maximum.  This  effect 

is  entirely  duo  to  the  shape  selected  for  the  input  pulses  and  the 

exponential  response  of  the  x  processes.  If  we  accept  the  sampled 

data  input  apparatus  described  in  section  3°1  &s  the  input  apparatus 

for  the  outstar,  then  this  effect  has  important  consequences.  It  says 

that  the  outstar' s  response  to  a  sample  taken  at  time  t  extends ,  with 

large  amplitude*  into  the  next  sampling  period  starting  at  t~  +  S 

and  beyond.  In  this  experiment,  we  selected  the  inputs  to  V~  to  occur 

2$      after  the  inputs  to  Vg.  As  explained  above,  the  inputs  to  V* 

were  selected  to  result  in  maximum  learning,.  From  the  trace  for  x~(t) 

it  can  be  seen  that  event  3  was  also  learned  to  be  associated  with  the 

command  event,  although  to  a  much  lesser  extent.  This  resulted  from  the 

"tail"  of  the  prediction  signal  still  being  reasonably  large  when  event 

3  occured.  The  product  x«(t)x  (t  -  X   )  was  therefore  sufficient  to 

j        c 

cause  z   to  grow  as  can  be  seen  from  z  (t)'s  trace.  Thus  when  the 
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out  star  was  tested  to  see  what  it  had  learned,  it  predicted  event  3 

as  well  as  event  2, 

Thus  the  "tail"  duration  effect  will  result  in  the  outstar  learning 

not  only  what  happens  in  the  sample  in  which  prediciton  signals  arrive 

from  the  command  node,  but  also  in  the  sample  taken  after  that.  By 

symmetry,  it  will  learn  the  samples  taken  before  in  the  same  way. 

We  will  mark  this  effect  for  further  study. 

Another  effect  to  note  in  figure  3»2.i  is  that  z  (t)  grew  with 

c2 

each  presentation  of  the  pattern  and  on  the  recall  test.  Because  u 

was  chosen  small,  z  _(t)  did  not  decrease  and  essentially  acted  as 

c2 

an  integrator  of  vx0(t)x  (t  -  X  ),  The  effect  of  the  growing  z  (t) 

<C     C  Co 

can  be  seen  in  the  trace  for  x_(t)  where  the  x  response  increases  in 

amplitude  on  each  presentation  or  prediction.  If  this  groirth  continues, 

we  could  expect  tl     responses  to  get  impractically  large.  Experiment  I 

was  continued  and  the  x  (t)  responses  did  continue  their  growth. 

Figure  3*2,2  shows  this  continuation  and  it  can  be  seen  from  the  trace 

for  x0(t)  that  the  x  responses  continued  to  grow  on  predictions  only. 

Not  only  are  the  x_  responses  growing  with  each  prediction,  but  a  quick 

look  at  the  z  (t)  trace  will  show  that  they  are  growing  at  an  increasing 
c2 

rate. 

Experiment  I  was  continued  not  only  to  study  the  growth  of  Xg 
responses  but  also  to  test  the  theoretical  prediction  that  outstars 
are  capable  of  correcting  all  mistakes.  An  attempt  was  made  to  correct 
two  types  of  mistakes  in  the  continuations.  It  was  decided  to  consider 
the  already  learned  associations  between  the  command  event  and  event  2 
as  a  mistake  and  that  the  correct  association  should  be  with  event  3» 
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Figure  3.2.2. 

Continuation  of  experiment  I. 
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Therefore,  event  3  was  presented  x   time  units  after  presentation  of 
the  command  event  three  times.  Event  2  was  not  presented  at  all. 

The  second  type  of  mistake  was  simulation  of  a  "random"  mistake 
by  presenting  event  i  once  t    time  units  after  presentation  of  the 
command  event , 

The  results  are  interesting,,  Due  to  their  growth,  x  responses 

continued  to  be  greater  than  x*  and  x~  responses.  The  x~  responses  were 

catching  up  with  the  x  responses,  but  from  the  z     (t)  and  z  0(t) 

2  c3        ^ 

traces,  it  can  be  seen  that  it  will  require  many  presentations  of  the 

V  ->  V^  association  before  x^  responses  will  reach  a  point  where  we 

could  say  that  the  V  -s>  V0  mistake  is  corrected, 

c    2 

From  Xj 's  trace  it  can  be  seen  that  the  "random"  mistake  was  re- 
membered by  the  outstar.  It  was  also  predicted  with  increasing  ampli- 
tude on  subsequent  predictions.  However,  with  the  results  of  experiment.  I 
plotted  as  thoy  are  in  figure  3*2,2  it  is  difficult  to  see  if  any 
mistakes  were  corrected.  The  theoretical  prediction  that  all  mistakes 
could  be  collected  involved  the  convergence  of  the  probabilities  X.(t) 
and  y.. (t)  to  6  .,  Translating  the  data  from  figure  3«2,2  to  these 
probabilities,  we  have  the  following  results: 

Tablo  3,2,1 

Translation  of  data  from  figure  3*2,2  to  probabilities  suitable 
for  comparison  to  theoretical  prediciton  that'  an  outstar  can  correct 
all  mistakes. 
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Table  3.2.1 
Response  number ,  M 


r 


e3  o      o05     o      o 


V1       ^  0  0.188  0o083  0.097 

y  0.C7  0.083  0.091  0.101 

e  i.o  o  o  o 

V    X2  0C892  0.563  0.0625  0e6l2 

y  0.886  0.75  0.682  0,632 


Qr>  0        0.5      .1.0      1.0 

V    X  •  0,107     0.249     0.292     0.319 

y  0.107     OJ.67     0.227     0,265  . 

The  M  =  0  response  column  is  the  results  from  the  last  response 

in  figure  3«2,1  and  is  the  initial  data  that  the  continuation  of  experiment 

I  began  with.  The  M  -  1  response  column  begins  the  attempt  to  correct 

the  mistake  V^-^V-  to  V-->V0  and  includes  the  "random"  mistake  of 
c    2     c    3 

presenting  event  1,  The  M  =  2  and  M  ~  3  response  columns  are  the  con- 
tinuing effort  to  correct  V  — >-V  to  V  -^  Vn  without  "random"  mistakes. 

c    2     c    3 

Except  when  the  random  mistake  occuredt  X.  and  y.  remain  small 
and  about  the  same  magnitude  as  the  duration  effect  "orror"  of  event  3 
in  the  first  part  of  experiment  I,  We  conclude  that  a  "random"  mistake 
affects  the  memory  of  the  out star  to  a  small  extent. 

Table  3*2,1  does  show  that  the  V e-^V2  m^-s^8^Q   Is  being  corrected 

to  V  ->  V  as  X  and  y0   are  decreasing  while  X,-  and  y„  are  increasing, 
0    3    2     2  j>  j 

However,  from  the  numbers  we  can  conclude  that  it  will  require  many 

presentations  of  the  V  -^V  pattern  before  the  magnitudes  of  X  and 

y~  exceed  X2  and  y?  and  many  more  presentations  of  V  -~>  V  bofore 

H-5 


X  and  y_  bear  the  same  relation  to  X_  and  y  as  X~  and  y  had  to  X 
j  _5  <-      <i     ^      c,  3 

and  y~  an  the  M  =  0  response.  In  the  Meantime,  it  could  be  expected 
that  the  x  response  will  have  become  unrealistically  large. 

The  uncontrolable  growth  of  the  x  responses  makes  this  outstar 
an  unattractive  device.  Although  it  conforms  to  the  theoretical 
predictions,  the  actual  means  by  which  we  measure  its  performance  is 
the  x  response  and  not  the  X  probabilities.  The  growing  x  responses 
means  that  in  our  piano  playing  example,  this  outstar  will  be  punching 
holes  through  the  keyboard  of  the  piano "with  its  fingers  when  it  plays 
a  frequently  used  chord.  Thus,  to  make  this  a  useful  device,  we  must 
find  some  means  of  limiting  the  x  responses  at  a  practical  amplitude. 
As  we  pointed  out,  the  growth  of  the  x  responses  was  due  to  the  growth 
of  the  zc^  xjrocess  which  determines  the  amplitr.de  of  prediction  responses. 
We  had  chosen  the  "forgetting  rate"  u  of  the  z   processes  to  be  small. 
At  the  same  time  we  did  so,  it  seemed  reasonable  to  have  the  outstar 
forget  slowly.  However,  non  decaying  s   processes  have  lead  us  to  an 

C1L 

undesirable  situation.  We  will  therefore  try  to  control  the  amplitude 
of  the  x  response  by  increasing  the  "forgetting  rate". 
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section  3*3    A  Simple  Outstar  with  a  "Fast"  Forgetting  Rate 

The  forgetting  rate  of  experiment  I  was  selected  to  be  "slow" 
relative  to  the  time  scale  of  experiment  I,  In  experiment  I,  the 
characteristic  decay  time,  l/u,  for  the  z   .  process  was  100  seconds 

C  JL 

which  was  long  compared  to  the  11  seconds  total  length  of  the  experiment , 
In  that  11  seconds,  the  network  w?s  asked  to  learn  one  pattern  and  then 
to  correct  it.  The  time  between  presentation  and/or  predictions  was 
1,8  seconds.  Thus,  when  we  speak  of  a  "fast"  forgetting  rate,  we  must 
decide  "fast  relative  to  what?". 

To  conserve  computation  time,  we  shall  make  the  forgetting  rate 

fast  relative  to  the  presentation  and/or  prediction  time  interval, 

-1- 
i,e,  l/u  s  1,8  seconds,  or  u  =  0,556  sec,   .  This  leads  us  into  another 

problem.  The  v  of  experiment  I  was  selected  on  the  "two  presentations 

mean  well  learning"  criteria.  That  is,  the  z       process  would  get  large 

ci 

enough  in  two  presentations  of  the  pattern  so  that  a  prediction  follow- 
ing these  presentations  would  drive  the  amplitudes  of  the  x  processes 
to  the  same  values  as  the  input  pulses  alone  would  drive  them.  If 
we  expect  the  network  to  forget  in  time  comparable  to  the  presentation 
interval,  it  would  be  better  to  change  v  such  that  it  conformed  to  a 
"one  presentation  means  well  learning"  criteria.  We  will  therefore 
double  v  to  v  =  3«2, 

To  compare  the  fast  forgetting  rate  outstar  to  the  slow  forgetting 
rate  outstar  of  experiment  I,  we  shall  re-porform  the  first  part  of 
experiment  I  with  all  other  parameters  specified  as  they  are  in  section 
3.1,  This  experiment  will  be  called  experiment  II, 
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Figure  3.3.1.   The  results  of  experiment  II  -  a  'simple  outstar 
with  a  fast  forgetting  rate. 
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Figure  3e3»l  shows  the  results  of  this  experi  ment.  Because 
the  responses  of  this  experiment  were  smaller  than  in  experiment  I, 
the  vertical  scale  for  the  x  traces  was   doubled  in  figure  3«3»le 

As  can  be  seen,  wo  have  managed  to  reasonably  control  the 

amplitudes  of  the  x  responses  by  allowing  the  z's  to  decay  between 

excitements  of  the  command  nodee  At  least  the  z's  do  not  exhibit  the 

monotonia  growth  they  did  in  experiment  I.  The  intended  association 

V  ->V~  was  learned  well.  Again  there  is  some  learning  of  V  — >   V„ 
c    j  c    j 

due  to  the  "tails"  of  the  prediction  signal.  The  non  zero  initial 
condition  on  z  ,  produced  no  perceptible  effect.  It  can  be  concluded 
that  the  out  star  performs  very  well  over  short  periods  of  time.  However, 
with  its  memory  decaying  rapidly,  how  long  will  its  memory  persist? 
This  question  hits  upon  one  of  the  key  features  of  an  out  star. 
The  mathematical  theorem  concerning  out stars  states  that  the  outstar's 
memory  of  a  pattern  remains  unimpaired  for  all  time  after  the  last 
presentation  of  the  pattern,  provided  no  new  or  random  pattern  is  pre- 
sented to  it  subsequently.  Of  course,  in  the  language  of  the  theorem, 

this  meant  that  the  y.'s  would  not  change  even  though  the  z  ,'s  were 

1  co. 

decaying  exponentially.  It  looks  like  a  fast  forgetting  out star  has 

the  opposite  problem  from  the  slowly  forgetting  one,  That  is,  the 

responses,  while  retaining  the  proper  x.  probabilities  to  define  the 

pattern,  are  so  minute  that  they  are  meaningless  measured  against  a 

practical  scale.  However,  the  third  response  on  the  z  _(t)  and  z  0(t) 

c2        c3 

traces  shows  that  a  prediction  will  cause  the  z  processes  to  grow0 

Now,  suppose  that  the  z  processes  have  all  decayed  to  the  point 
where  a  prediction  by  the  outstar  results  in  meaninglessly  small  grid 
x  responses.  Then  if  enough  predictions  are  made  rapidly  enough,  we 
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can  "pump  up"  the  s's  to  the  point  where  the  x  responses  are  large 
enough  to  mean  something.  Grossberg's  theorem  insures  that  the  ampli- 
tude of  the  x  process  will  remain  in  the  proper  ratios  to  one  another. 
Experiment  II  was  continued  to  demonstrate  this  memory  "pumping  up" 
and  the  results  are  shown  in  figure  3°3<>2,  As  can  ^e  seen,  the  outstar's 
memory  was  allowed  to  decay  for  awhile  and  then  the  command  event  was 
presented  to  the  outstar  three  times  in  rapid  succession,  "Pumping 
up"  occured  as  expected. 

A  psychological  interpretation  of  memory  pumping  up  would  not  be 
tenuous.  It  is  an  every  day  occurance  to  have  a  piece  of  previously 
learned  information,  a  name  say,  on  the  "tip  of  one's  tongue",  but 
not  bo  able  to  reeallit  until  all  the  associations  connocted  to  it 
have  been  rceallede  If  we  consider  the  name  to  be  inscribed  on  the 
grid  on  an  outstar,  then  recalling  things  associated  with  the  name 
would  be  equivalent  to  rapid  excitements  of  the  command  node.  After 
enough  such  excitements,  the  name  would  appear  to  "pop  into  one's 
head".  The  memory  of  the  name  would  then  be  "fresh"  for  sometime  after 
being  resurrected  before  it  again  faded  into  the  "preconscious" • 
We  will  introduce  a  modification  in  section  3«5  which  will  make  the  idea 
of  a  faded  memory  "popping"  into  the  outstar's  "head"  more  precise. 

Of  course,  a  presentation  of  the  pattern  after  the  outstar's  z 
processes  have  decayed  to  small  values  will  also  refresh  its  memory. 


section  3»^  Resistance  to  Random  Mistakes  vs.  Correction  of 
Loarned  Mistakes?  A  Philosophy  for  Learning  in 
Oat stars 

Experiment  II  was  continued  to  investigate  the  effects  of  a 

simulated  random  mistake  on  a  simple  outstar  with  a  fast  forgetting 

rate.  The  results  are  shown  in  figure  3,^,1,     Event  i  was  presented 

at  the  same  time  as  event  2  to  simulate  the  occurance  of  a  random 

mistake  in  the  pattern.  As  can  be  seen  from  the  x  and  z  .  traces, 

the  random  mistake  completely  confused  the  outstar.  Whereas  the  outstar 

had  previously  learned  the  association  V  — 5»V  ,  occurance  of  the  random 

mistake  resulted  in  the  outstar  remembering  V  -^  V0  and  to  only  a 

c    2 

slightly  lesser  extent,  V  ->  V  .  The  amplitude  of  the  second  and 

c    1 

third  x.  prediction  responses  in  figure  3«^el  are  significant  enough 
to  conclude  that  the  random  event  resulted  in  confusion.  The  memory 
of  a  simple  outstar  with  a  fast  forgetting  rate  has  very  little 
resistance  to  random  mistakes. 

To  understand  the  significance  of  this  outstar' s  low  resistance 
to  random  mistakes,  we  must  develop  an  understanding  of  the  outstar* s 
relationship  to  its  external  environment.  Up  to  now,  we  have  just 
been  concerned  with  the  interval  workings  of  the  outstar.  Now  con- 
sider that  the  outstar  is  a  machine  which  includes  the  outstar  network 
previously  described  plus  an  input  apparatus.  This  machine  "lives" 
in  an  environment  in  which  events  occur.  The  input  apparatus  filtors 
the  events  occur  j.ng  in  the  environment  and  delivers  an  input  pu3.se 
to  the  appropriate  node  in  the  outstar  when  one  of  the  events  the 
outstar  is  capable  of  recognising  occurs.  The  outstar  is  capable 
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Figure  3.A.I.   Continuation  of  experiment  I  from  Figure  3.3.2, 
P-jCt)  simulates  a  random  mistake  in  the  pattern  previously 
taught  to  the  outstar. 
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of  learning  the  association  between  the  command  event  and  any  events 
which  are  represented  by  grid  nodes  if  thoy  occur  approximately  X 
time  units  after  occurance  of  the  command  events 

In  order  for  the  outstar' s  learning  ability  to  conform  with  in- 
tuitive notions  about  learning,  we  would  want  it  to  learn  that  the  command 
event  is  associated  with  a  particular  pattern  if  and  only  if  the  occur- 
ance of  the  command  event  in  the  environment  is  usually  followed  by 
the  occurance  of  the  pattern.  Suppose  the  outstar  observed  one  bowl- 
ing ball  colliding  with  another  with  the  result  that  the  first  ball 
stopped  dead  and  the  second  bowljng  boll  rolled  away  from  the  collision 
point  with  the  same  velocity  that  the  first  bowling  ball  had  before 
the  collision.  After  the  first  observation  of  this  event,  we  would 
expect  the  intelligent  outstar  to  suspect  that  it  had  observed  a  law 
of  nature  that  applied  to  all  bowling  ball  collisions.  We  would  expect 
the  outstar  to  go  from  a  state  of  ignorance  about  the  conservation  of 
momentum  to  an  intuitive  understanding  of  it.  Philosophically,  we  desire 
the  outstar  to  be  an  inductive  learning  machine. 

If  we  described  this  situation  statistically,  we  may  assign 
probabilities  to  the  occurance  of  events  in  the  environment.  At  any 
given  time,  t,  we  may  describe  the  likelihood  of  the  occurance  of  an 
event  associated  with  the  V  node  in  the  outstar  by  the  probability 
PR,  ,  Additionally,  ue   can  describe  the  relationship  between  the  oe- 
curance  of  events  with  the  conditional  probability  PR.*/,  which  is  the 
probability  of  the  occurance  of  event  j   given  that  event  k  oecured 
recently.  In  the  outstar  we  are  particularly  concerned  with  the  prob- 
abilities PRt/c  where  c  is  the  command  event  and  the  i  are  the  grid 

events.  To  make  the  outstar  an  inductive  learning  machine,  we  want 
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it  to  learn  V  ->Vi  if  and  only  if  PR  .     is  large.  If  PRwc  is  email 

we  would  want  the  out  star  definitely  not  to  learn  V  ->V4, 

c    J 

On  the  first  oecurance  of  a  pattern  following  the  command  event 
by  approximately  X   time  units,  the  out  star  can  have  no  idea  of  how 
largo  PR.  /  is.  Therefore  we  would  trant  it  to  only  suspect  that  the 
command  event  usually  preceeds  this  pattern.  However,  if  the  next 
timo  the  command  event  occurs,  it  is  followed  by  the  pattern,  then 
there  5  s  good  evidence  that  PR.  /  is  large  and  the  outstar  should 
draw  this  conclusion,  Row,  in  ths  real  world,  we  expect  background 
noise.  That  is,  if  event  j  does  not  usually  follow  the  oecurance  of 
event  c,  there  is  nevertheless  a  small  probability  that  it  will  occur 
as  a  random  mistake  sometime.  In  order  to  protect  the  outstar' s 
memory,  we  would  want  it  to  be  resistant  to  drawing  spurious  conclu- 
sions about  the  association  of  the  command  event  with  randomly  occuring 
mistakes.  If  the  outstar  observed  the  collision  of  bowling  balls 
in  which  one  of  the  balls  was  shattered  into  many  pieces,  we  would  not 
want  this  random  oecurance  to  dostroy  its  confidence . in  the  conser- 
vation of  momentum. 

The  memory  of  a  pattern  in  an  outstar  is  contained  in  the 
probabilities : 

yi(t)^,oi(t)[j|zc.(t)] 

The  equation  describing  the  z's  is: 

■V,(t)  =  -uz  .(t)  +  vx.(t)x  (t  -  r  ) 
ci        ci       i    c 

In  the  case  where  u  is  very' small,  this  is  equivalent  to* 


-1 


oiCt)=v  f\(f)x0(^  -r)<nj 
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Define  I  =     v  J  x^P  )x  (£  -  *   )<5*j 
where  x.(t)  is  the  response  of  a  grid  node  to  one  input  pulse  in  the 
infinite  time  period,  and  x  (t  -  T  )  is  the  prediction  signal  from 
the  command  node  X   time  units  before  the  grid  event •  Thus,  if  in  all 
time  prcdeeding  t,  the  command  event  has  been  presented  to  the  outstar 
M  times, 

Zc.(t)~K(PRi/c)I 
Thus,  if  PR.  /  is  large  corresponding  to  a  causal  association  between 

1/C 

event  c  and  i  in  the  environment,  z  .(t)  will  be  large.  On  the  other 

cil  ° 

hand,  a  small  PRwc  corresponding  to  event  j   occur ing  randomly  and  not 
causally  associated  with  event  c  in  the  environment,  z  (t)  will  be 
small 6  Thus  the  z's  can  be  considored  random  variables  faithfully 
reflecting  the  a  priori  corjditional  probabilities  in  the  environment . 
Note  that  this  reflection  of  the  statistical  description  of  the 
environment  is  contained  in  the  amplitudes  of  the  z*s  and  is  built 
up  by  experience  with  1-1  presentations  of  the  pattern.  The  resistance 
of  the  simple  outstar  with  a  slow  forgetting  rate  in  experiment  I 
to  random  mistakes  was  due  to  this  correspondence  between  the  amplitudes 
of  the  z's  and  the  a  priori  probabilities  in  the  environment.  It  may 
be  concluded  that  whereas  the  outstar' s  memory  of  a  pattern  is  contained 
in  the  y.(t)'s,  its  memory  of  its  experience  is  contained  in  the  ampli- 
tudes of  the  z   processesc  Thus,  when  its  memory  of  its  past  experience 
is  allowed  to  be  forgotten  at  a  fast  rate  as  in  experiment  II,  the 
occurance  of  a  random  mistake  has  disastrous  consequences  for  its 
memory  of  the  pattern. 

It  is  not  surprising  that  a  machine  which  forgets  its  past  ex- 
perience rapidly  will  be  very  susceptible  to  havirtg  its  mind  changed. 
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We  may  look  at  this  as  both  a  benefit  and  a  drawback.  In  the  slow 
forgetting  out star  of  experiment  I,  the  attempt  to  change  its  mind 
about  a  previously  learned  pattern  by  teaching  it  a  new  one  was  only 
partially  successful*  It  required  only  two  presentations  of  the 
original  pattern  for  the  out star  to  learn  it.  However,  the  evidence 
of  the  attempt  to  correct  this  pattern  indicated  that  many  more  pre- 
sentations of  the  correcting  pattern  would  be  required  to  change  its 
mind.  The  outstar's  resistance  to  random  mistakes  was  laudable,  but 
its  relative  inability  to  change  with  changing  times  could  be  a  serious 
drawback  in  its  environment.  On  the  other  hand,  the  fast  forgetting 
out star  will  have  no  trouble  changing  its  mind  with  the  times,  but 
its  low  resistance  to  random  mistakes  is  also  a  serious  drawback. 

We  may  summarize  the  above  heuristic  discussion  of  the  constant 
u  in  an  out stars 

(a)  A  small  u  implies: 

(i)  past  experience  is  slowly  forgotten 
(ii)  high  rcsistnace  to  random  mistakes 
(iii)  low  correctability  of  previously  learned  mistakes. 

(b)  A  large  u  implies  2 

(i)  past  experience  is  rapidly  forgotten 

(ii)  low  resistance  to  random  mistakes 

(iii)  high  correctability  of  previously  learned  mistakes. 
In  addition,  we  must  consider  one  further  effect  of  the  constant  u 
on  the  perfornance  of  an  out star? 

(c)  A  small  u  results  in  uncontroled  growth  of  the  grid  x  pro- 
cesses' amplitudes. 

Again  it  is  stressed  that  "large"  and  "small"  u's  refer  to  whether 
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the  characteristic  decay  time  l/u  is  long  or  short  relative  to  the 
expected  time  interval  between  presentations  to  and/or  predictions 
by  the  outstar. 

Because  of  condition  (c)  above ,  a  practical  simple  outstar  re- 
quires a  large  u«  Thus  design  improvements  to  the  fast  forgetting 
outstar  which  results  in  greater  resistance  to  random  mistakes  are 
desireable.  In  the  next  several  chapters  we  shall  introduce  more 
cemplicated  outstars  which  exhibit  improved  noise  resistance 
without  the  x  process  amplitude  problems  of  the  simple  outstar. 
However,  for  the  present,  wo  still  have  an  avenue  open  for  increasing 
the  simple  outstar* s  noise  resistance. 

Part  of  the  reason  for  the  poor  noise  resistance  of  the  simple 
outstar  in  experiment  II  was  due  to  the  fact  that  v  was  solected 
by  the  "one  presentation  means  well  learning"  criteria.  Thus  presen- 
tation of  a  random  mistake  once  resulted  in  its  being  well  learned. 
Had  wo  solected  a  smaller  v  and  required  more  presentations  of  the 
pattern  in  rapid  succession  to  result  in  well  learning,  then  the  effect 
of  the  random  mistake  would  be  smaller.  At  the  same  time  the 
correctability  of  previously  learned  mistakes  would  decrease.  If  we 
wish  to  make  the  noise  resistance  of  the  outstar  very  good  by  this 
method,  then  we  must  be  content  with  an  outstar  of  slow  intelligence 
that  requires  having  a  pattern  drummed  into  its  head  before  it  learns 
it;  or,  we  could  use  the  pumping  up  phenomena  of  the  outstar  and  have 
it  think  about  a  pattern  presented  to  it  many  times  in  rapid  succession 
before  it  is  well  learned.  Selection  of  the  proper  v  to  bo  used  in  an 
outstar  is  a  design  decision  which  must  take  into  account  this  trade 

off. 
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section  3«>5     The  Oecurance  of  .a  Pattern  of  Events  over  a  Period 
of  Time;  Thresholds 

In  experiments  I  and  II,  the  grid  node  events  in  a  pattern  were 

always  presented  exactly  X   time  units  after  the  command  event. 

The  reason  for  this  is  that  it  takes  x   time  units  for  the  x  response 

to  the  command  input  pulse  to  travel  along  the  directed  edges  to  the 

arrowheads  impinging  on  the  grid  nodes.  Until  the  prediction  signal 

x  (t  -  T  )  arrives  at  the  arrox-rheads  there  can  be  no  correlation  be- 
c 

tween  x  (t  ~  T  )  and  the  x  process  response  to  a  grid  input  pulse  at 
the  adjacent  grid  node.  Thus  no  learning  can  occur  until  the 
prediction  signal  begins  to  arrive  at  the  arrowheads.  However,  we 
have  seen  indications  that  learning  does  occur  with  grid  events 
presented  at  times  other  then  X   time  units  after  presentation  of  the 
command  event.  In  this  section  we  shall  examine  this  phenomena, 
but  first  we  must  develop  a  notion  that  will  make  discussion  of  this 
phenomena  easier.  If  we  are  going  to  study  how  well  an  out star  learns 
associations  between  the  command  event  and  grid  events  which  may  occur 
more  than  or  less  than  T  time  xinits  after  presentation  of  the  command 
event,  we  will  need  a  method  of  describing  when  these  events  occur. 
Measuring  the  oecur-ance  of  grid  events  relative  to  the  occurance  of 
the  command  event  is  not  a  very  good  idea.  No  learning  can  occur 
until  the  prediction  signal  has  arrived  at  the  arrowheads.  The 
transmission  time  dolay  X   is  a  rather  arbitrary  time  interval  which 
may  be  changed  from  outstar  to  out star. 

However,  once  the  prediction  signal  begins  to  arrive  at  the  arrow- 
heads, the  outstar  will  begin  to  learn  the  pattern  on  the  grid  nodes 
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Figure     3.5.1.      The  upper  traces   show  the   inrut   pulse  used^the 
resulting  prediction  signal^    the  response  of   a   grid   node   to  an 
event  of  (0  =  0  presentation  phase,   and  the  response  of  the  zc^Ct) 
process  associated  vith   that  node.    The  bottom  curve  shows   the 
phase-correlation  curve  and  the  irreducible  rhase-correlation   curve. 


indepondent  of  how  long  it  took  the  prediction  signal  to  travel  from 

the  command  node.  Thus  a  good  reference  point  for  describing  the 

occurance  of  grid  events  is  the  time  instant  when  the  prediction  signal 

begins  to  arrive  at  the  arrowheads.  We  shall  denote  this  instant  in 

time  as  <p  =  0  and  let  <^  be  the  time  measured  relative  to  f  =  0  at 

which  grid  events  are  presented.  Grid  events  which  occur  before 

<P  =  0  will  be  said  to  occur  at  negative  values  of  ^  and  grid  events 

which  occur  after  <P  =  0  will  be  said  to  occur  at  positive  values 

of  <p  •    shall  be  called  the  phase  of  an  event  with  respect  to  the 

prediction  signal,  or  simply  the  presentation  phase.  To  be  precise, 

<P  will  be  defined  as  follows.  Let  t  be  the  time  instant  at  which 
T  P 

the  prediction  signal  begins  to  arrive  at  the  arrowheads.  Let  t  be 
the  time  instant  at  which  a  grid  node  input  pulse  begins  to  be  non 
zero.  Then    is: 

<p  =  t  -  t 

1     e    p 

The  following  experiment  was  performed.  A  practical  simple  outstar 

with  a  fast  forgetting  rate  and  many  grid  nodes  was  sot  up.  The  constant 

v  was  selected  to  result  in  well  learning  in  one  presentation  of  a 

grid  event  with  <p  =  0  presentation  phase.  Then  each  of  the  grid 

nodes  were  excited  with  events  presented  with  various  presentation 

phases.  The  z  processes  were  all  given  zero  initial  conditions.  The 

maximum  amplitude  of  the  z  processes  attained  during  the  experiment 

was  plotted  against  the  presentation  phase  <p  ,  Lacking  any  better 

name  for  a  curve  showing  the  variation  of  z  process  amplitiid.es  with 

the  presentation  phase,  the  curve  shall  arbitrarily  be  called  a 

"phase-correlation"  curve,  A  phase-correlation  curve  is  shown  at  the 

bottom  of  figure  3e5»l 

6/ 


Figaro  3»5«1  shows  a  variety  of  things  besides  a  "phase- 
correlation"  curve.  The  top  trace  in  figure  3»5«1  shows  the  shape 
and  dimensions  of  the  input  pulse  used  in  the  experiment.  The 

x  (t  -  T  )  trace  shows  what  the  prediction  signal  looked  like  as  it 
c 

arrived  at  the  arrowheads.  The  first  response  of  the  x.(t)  trace  shows 

what  the  x  process  response  looked  like  for  a  grid  node  excited  by 

an  event  presented  with  <p  -  0  presentation  phase.  The  second 

response  on  the  x.(t)  trace  shows  what  a-  prediction  response  for  this 

grid  looks  like.  The  z  .(t)  trace  shows  what  the  z     (t)  process 

ci  ci    l 

in  the  arrowhead  impinging  on  the  above  V.  grid  looked  like.  The 
irreducible  phase-correlation  curve  shown  is  related  to  the  phase- 
correlation  curve  and  will  be  explained  shortly. 

The  additional  information  shown  in  figure  3o5«i  is  provided  as 
a  pictoral  look  at  the  various  processes  going  on  in  an  outstar. 
This  information  was  gathered  from  a  number  of  experiments  and  will 
be  compared  to  the  results  of  the  next  section  in  which  vie  study  the 
effects  of  using  other  input  pulses  in  an  outstar.  Thus  the  actual 
numerical  values  for  the  amplitudes  of  the  processes  shown  are  some- 
what meaningless.  To  allow  comparisons  to  be  made,  the  data  in  figure 
3#5»1  was  plotted  as  functions  of  various  network  parameters. 

In  the  preceding  experiments  we  have  followed  the  convention 
in  assigning  values  to  the  x  process  rise  rate  oc  and  the  input  pulse 
duration  S  of  setting  8  =  !/«■  ,  The  time  interval  S  =  l/a 

describes  two  important  time  intervals  in  the  network:  The  input 
pulse  duration,  and  the  rise  time  of  the  x  processes.  Since  this 
study  is  limited  to  input  pulses  of  duration  8  and  since  we  have 

assigned  a   such  that  i/oc   =8  throughout,  a  natural  selection 

62 


for  a  time  unit  among  the  experimental  parameters  is   o  =  l/a.   , 
The  time  axes  in  figure  3o5«-'  a.re  thus  in  terras  of  8  =  1/<X   , 
Since  the  time  constant  associated  with  the  z   processes  is  the  decay 
tima  l/u»  this  period  is  shown  on  the  z   traces. 

The  analytical  solution  for  a  x  process  responding  to  a  rectan- 
gular input  pulse  presented  at  time  t  ~  tQ  of  amplitude  A  and  duration 

is: 

a[t-t0J  * 

(A/oc  )(1   -  e"  )         for  t     f:    t  t   tQ  +  8 

'(A/cX.  )(1  ■■  e  1)o  for  tQ  +8  ±   t 

r  n  + 

where  the  notation  L  J   is  defined  by: 


ur  = 


y  for  y  >  0 
0  for  y  -  0 

Note  that  this  solution  is  valid  independent  of  the  numerical  values 
assigned  to  A,  0(  ,  and  S  as  long  as  J  =  l/<X   . 

Thus  the  amplitudes  for  the  x  processes  are  alirays  proportional 
to  kf  <X  and  this  combination  of  experimental  parameters  was  used  as 
the  amplitude  axes  for  the  x  processes  shown  in  figure  3«5ei» 

The  equation  for  the  z   processes  is  nonlinear  and  an  analytical 
solution  was  not  found  in  this  study,  A  combination  of  experimental 
parameters  treis  sought  to  scale  the  amplitude  axes  for  the  z   process 
traces  and  the  phase-correlation  curves.  It  was  desired  that  a  plot 
of  a  z   process  against  this  scale  factor  would  be  the  same  for  all 
experiments  even  though  the  numerical  values  of  the  parameters  in  the 
experiments  were  different.  At  the  beginning  of  the  experimental 
study „  the  parameter  combination  v(A/cx  )  b  seemed  to  work  well  and 
was  therefore  adopted.  However  later  experiments  showed  that  this 
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scale  factor  did  not  work  well.  Nevertheless,  it  was  retained  to  allow 
comparisons ,  With  this  explanation  of  the  scales  for  the  axes  of  the 
plots  in  figure  3»5»1  ,  we  may  proceed  with  a  discussion  of  the  phase- 
correlation  curve9 

The  phase-correlation  curve  in  figure  3<>5»1  shows  the  maximum 
increase  in  amplitude  of  a  z  process  due  to  the  correlation  between  the 
prediction  signal  and  a  grid  node  x  process  excited  by  an  event 
presented  with  presentation  phase  <§>    ,  As  can  be  seen,  the  maximum 
increase  in  amplitude  for  a  z  process  occurs  when  a  grid  node  is 
excited  by  an  event  with  <p  =  0  presentation  phase.  Events  presented 
with  <p  'f   0  indicating  that  they  were  presented  before  or  after  the 
arrival  of  the  prediction  signal  at  the  arrowheads  result  in  a  lesser 
increase  in  z  process  amplitude.  For  l<pi  >  3S   -  3/<X  ,  there  is  no 
appreciable  increase  in  z  procoss  amplitude. 

The  effect  of  the  phenomena  revealed  by  the  phase-correlation 

curve  may  be  interpreted  in  a  number  of  ways.  Suppose  that  a  command 

event  is  presented  to  the  outstar  at  time  t  ,  Suppose  further  that  a 

collection  of  gird  events,  1,  2,  ,,,,  Mf  usually  accompany  the  occurance 

of  the  command  event  in  the  environment.  However,  suppose  that  those 

grid  events  do  not  all  occur  at  the  same  time.  Let  each  one  occur 

at  time  t^ ,  t  ,  ,  „ .  „  t^c  The  prediction  signal  gonc^ratod  by  the 

command  event  will  arrive  at  the  arrowheads  at  time  t  ■*-  X   ,  The 

c 

phase-correlation  curve  tells  us  that  the  outstar  will  learn  to  some 

extent  that,  all  the  grid  events  which  occur  at  times  t.  such  that? 

1  (t  +  X   )  -  t.|  <    3  8   =  3/<* 
c         a 

are  associated  with  the  command  event.  Note  that  (t  +  X   )  -  t.  is 

c         i 

the  presentation  phase  O  . ,  for  the  ith  event.  The  phase-correlation 

6H- 


curve  tells  us  further  that  those  events  which  occur  at  times  t . 

3 

such  that j 

|(t  +r  )  -  t  I  <  0.5      =  1/2.0C 

c         3 
will  be  learned  to  be  associated  with  the  command  event  very  well. 

One  interpretation  of  this  information  is  that  we  now  have  a 
means  by  which  we  intelligently  can  specify  f  in  an  outstar.  We  have 
said  nothing  about  when  a  command  event  occurs  relative  to  a  pattern 
of  grid  event s„  In  every  day  experience  we  are  confronted  with  situ- 
ations in  which  the  occurance  of  a  "command"  event  results  in  the 
occurance  of  a  "pattern"  of  events.  The  time  delay  between  occurance 
of  the  command  event  of  switching  an  electric  light  switch  resulted 
almost  immediately  in  the  pattern  of  the  electric  lights  in  a  room 
gojng  on.  We  also  learned  that  the  command  event  of  putting  a  seed 
in  the  ground  resulted  days  later  in  the  "pattern"  of  a  plant  sprouting. 
In  designing  an  outstar  functioning  in  a  "real"  environment ,  specifi- 
cation of  X    should  be  made  according  to  the  average  time  delay  between 
occurance  of  command  events  and  the  associated  patterns  tliat  the  outstar 
is  capable  of  learning.  The  phase-correlation  curve  tells  us  what  the 
standard  deviation  of  this  time  delay  can  be  and  still  result  in  the 
outstar  being  able  to  learn. 

On  the  other  hand,  the  phenomena  shown  by  the  phase-correlation 

curve  is  a  source  for  errors  in  an  outstar  avalanche.  Suppose  that 

tho  command  nodes  in  an  avalanche  command  node  cascade  are  so  arranged 

that  tho  time  interval  between  excitement  of  the  V   corrtmand  node  and 

03 

the  V  . , .  command  node  is  t     •  This  means  that  the  avalanche  takes 
cj+1  c 

"pictures"  of  the  time  varying  pattern  of  grid  events  every  X  n   time 

units  to  make  a  sampled  data  approximation  of  the  pattern.  From  the 
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phase-correlation  curve  of  figure  3«5«1  we  can  see  that  if  T   is  loss 

"  c 

than  3  S  -  3/<X  t  the  picture  taken  by  the  outstars  in  the  avalanche 

will  overlap  one  another.  That'  is,  the  V  . ,, *    outstar  will  learn  to 

some  extent  the  same  pattern  of  events  that  the  V   outstar  learns. 

In  particular,  suppose  that  tho  pattern  of  events  is  varying  rapidly 

enough  that  the  pattern  of  grid  events  at  time  t  +  S  is  significantly 

different  from  that  at  time  t.  To  get  an  accurate  sampled  data 

approximation  in  this  situation,  the  avalanche  would  have  to  take 

a  "picture"  every  8  time  tmits  and  wa  would  set  X     -  o    ,  However, 

the  phase-correlation  curve  shows  us  that  in  this  case  the  V  ... 

cj+i 

outstar  will  learn  not  only  the  pattern  of  events  on  the  grid  when 
its  prediction  signal  arrives  at  the  arrowheads,  but  also  the  pattern 
of  events  that  i;as  on  the  grid  when  the  prediction  signal  from  tho 
V   outstar  arrived  at  the  arrowheads.  In  this  situation,  the 
avalanche's  sampled  data  approximation  will  be  seriously  in  error. 

The  phenomena  shown  by  the  phases-correlation  curve  in  figure 
3.5«i  is  due  to  two  things.  First,  the  input  pulses  used  in  the  ex- 
periment were  rectangular  and  of  duration  S  ,  Suppose  that  the  equation 
for  the  x  processes  was  such  that  the  x  processes  exactly  reproduc ed 
the  input  pulse.  That  iss 

x(t)  =  P(t) 

Then  the  prediction  signal  and  the  x  processes'  responses  would  be 

rectangular  in  shape  and  of  duration  S   .  The  z   process  correlates 

the  prediction  signal  with  the  grid  node  x  process.  Thus  wo  could 

expect  the  z  process  amplitude  increase  due  to  a  correlation  to  be 

proportional  to  the  correlation  between  the  rectangular  prediction 

signal  and  the  rectangular  grid  node  x  process.  If  the  grid  node  is 

GG 


excited  by  an  event  which  occurs  with  presentation  phase  <p  with 
respect  to  the  arrival  of  the  prediction  signal,  we  get: 

L  L  dt  J     for-  <P  >  0 
I  J  dt  J     for  <p<  0 


or 


z 


(t)  - 


[&-  ^J  '     for  cp  >  0 

[S  -1-  <p]  H     f  or  cf  <  0 


This  is  just  the  correlation  between  two  rectangular  pulses  of 
duration  &   whose  leading  edges  are  separated  in  t3ine  by  <§    .  This 
function  is  shown  in  figure  3e5«l  &s  the  "irreducible  phase-correlation" 
curve.  This  curve  is  called  irreducible  because  it  shows  what  the 
phase-correlation  curve  would  look  like  if  the  x  processes  exactly 
reproduced  the  input  pulse* 

As  we  have  seen,  the  x  processes  do  not  exactly  reporduce  the 
input  pulses o  This  is  because  embedding  field  network  nodes  are  low 
pass  filters c  We  have  seen,  and  our  analytical  solution  shows,  that 
the  x  processes'  response  decays  exponentially  away  from  the  inaxisium 
value  it  obtained  during  the  presentation  of  the  input  pulse*  This 
exponentially  decaying  portion  of  an  x  process  response  will  be  called 
a  "tail".  These  tails  account  for  the  difference  between  the  irre- 
ducible phase-correlation  curve  and  the  phase-correlation  curve. 
Because  of  the  tails,  events  presented  with  presentation  phase  <p  such 
that  <p  <  <  0  stilD.  have  non  zero  amplitudes  to  correlate  with  the 
prediction  signal  when  it  arrives  at  the  arrowheads.  Predict ion  signals 
also  have  tails  which  correlate  with  grid  node  x  process  responses  to 

events  presented  with  presentation  phase  <D  »   0,  As  can  be  seen, 
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this  effect  begins  to  become  important  for  events  presented  with 
preseirtation  phase  1<$M  <  3  <5  . 

In  an  avalanche  with  a  fast  sampling  rate,  modifications  of  the 
component  outstars  that  result  in  a  phase-correlation  curve  which  more 
closely  resembles  the  irreducible  phase-correlation  curve  are  impor- 
tant. One  modification  would  be  to  increase  the  x  process  riso  rate 

(X  •  Making  0(  very  large  will  result  in  x  process  response  that  will 
very  closely  follow  the  shape  of  the  input  pulses*  Thus  the  phase- 
correlation  curve  should  be  very  close  to  the  irreducible  phase- 
correlation  curve* 

However,  increasing  Ol  is  not  always  possible*  In  this  study, 
increasing  (X  either  resulted  in  intolerable  errors  or  extremely 
lengthy  computer  runs  to  perform  an  expreiment.  Appendix  A  explains 
the  error-computation  time  trade  off  in  selection  of  ex  for  the 
digital  simulations  of  this  study. 

If  o;  can  not  be  increased  enough  to  make  the  phase-correlation 

curve  sufficiently  close  to  the  irreducible  phase-correlation  curve, 

there  are  other  methods  which  will  accomplish  this*  Grossberg  has 

proposed  the  use  of  thresholds*  The  equations  for  a  simple  out star 

with  thresholds  are: 

3*5.1    x  (t)  -  -ax  (t)  +  P  ft) 
c        c      c 

3.5.2  x.(t)  =  -c(x,(t)  +  flz  ,(t)f  x  (t-T)-fl  +  P.(t) 

3.5.3  i  .(t)  =  -us  (t)  +  vfx(t  -r  )  -Vl+  U(t)  -V]  + 
ci       ci      >-  c  cJ  L  i      x-i 


where : 

[y] 


ex       ca 

+  \  y  for  y  >  0 
0  for  y  t  0 
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Figure  3.5.2.   Illustration  of  the  effects  of  thresholds  on  a  simple 
outstar.  Equivalent  thresholds  are  placed  on  both  the  command  node  and 
the  grid  nodes.  Note  how  close  the  phase-correlation  curve  is  to  the 
irreducible  phase-correlation  curve. 
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j1  is  the  command  node  threshold.  As  can  be  seen,  it  prevents 
c 

the  prediction  signal  x  (t  -  X  )  from  exciting  a  grid  x  process 

c 

■until  x  (t  -  T  )  >  |  •  Additionally,  it  prevents  the  prediction  signal 
c  c 

from  being  correlated  with  the  grid  nodes'  x  processes  until  x  (t  -  X   ) 

c 

is  suprathreshold.  The  grid  node  threshold,  V  t   performs  the  same 

function,,  In  effect,  these  thresholds  will  cut  off  the  "tails"  of 

the  x  processes  and  thus  should  result  in  a  phase-correlation  curve 

which  closely  resembles  the  irreducible  phase-correlation  curve. 

Figure  3«5<>2  shows  the  results  of  an  experiment  conducted  with 

an  out  star  with  thresholds.  The  command  node  threshold  T  used  in 

c 

this  experiment  was  selected  to  make  the  time  interval  during  which 

the  prediction  signal  is  suprathreshold  approximately  S  time  units 

in  duration  as  can  be  seen  from  the  x  (t)  trace.  The  grid  node  threshold 

c 

"F  was  selected  to  be  the  same.T  ~  T1  •  As  can  be  seen,  the 
X  X     c 

phase-correlation  curve  very  closely  approximates  the  irreducible 

phase-correlation  curve.  Using  thresholds,  we  could  make  an  avalanche 

which  could  accurately  sample  a  time  varying  pattern  every  2  8  time 

units.  Without  thresholds,  the  shortest  the  accurate  sampling  interval 

could  be  is  about  6  6  tine  units  as  shown  in  figure  3«5ti»  Thus  the 

addition  of  thresholds  has  increased  the  accurate  sampling  rate  for 

an  avalanche  by  a  factor  of  three. 

However,  this  possible  increase  in  the  accurate  sampling  rate  for 

an  avalanche  has  not  been  obtained  without  a  cost.  The  x  (t)  traces 

and  the  z     (t)  traces  in  figure  3»5c2  are  for  a  grid  node  excited  by 
c2 

an  event  with  presentation  phase  tf  =  0,5  8  ,  Looking  closely  at  the 

x  (t)  trace,  one  can  see  that  in  the  first  response,  P  (t)  drove 

x0(t)  above  threshold  and  thus  z     (t)  grew.  However,  on  the  second 
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FIGURE  3«5»3e  Effect  on  the  phase-correlation  curve  of  a 
threshold  placed  on  the  command  node  only. 
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response,,  the  escitement  of  x  (t)  was  insufficient  to  drive  it 

2 

suprathreshold  and  tlms  r/>     (t)  continued  its  exponential  deeay. 

c2 

Lacking  the  ability  to  drive  x  (t)  suprathreshold,  the  out  star  can 
not  "puvop  up"  the  z  (t)  process  and  we  must  conclude  that  the  memory 
z  ?(t)  is  bound  for  extinction.  In  the  same  way,  if  the  z c«(t)  is 
allowed  to  decay  further,  prediction  excitement  of  x* (t)  will  also 
be  unable  to  drive  x«(t)  suprathreshold  and  all  memory  of  tho  pattern 
would  be  bound  for  extinction.  In  the  simple  outstar  without  thresholds, 
\re   saw  that  no  matter  how  much  the  z  processes  decayed,  wo  could  still 
recover  the  information  stored  in  them  by  "pumping  up".  Thus,  although 
a  memory  could  fade  due  to  forgetting,  it  could  not  be  absolutely 
forgotten.  An  outstar  with  grid  node  thresholds  can  absolutely  forget 
a  pattern  it  has  learned. 

To  prevent  a  memory  from  being  absolutely  forgotten,  we  must  set 
I   =  0,  This  was  done  and  a  series  of  experiments  were  performed 
to  determine  the  phase-correlation  curve.  Figure  3«5°3  shows  the  results. 
The  only  x  process  "tail:  that  was  cut  off  by  a  threshold  was  the  pre- 
diction signal's.  Thus,  the  phase~corr elation  curve -for  <P>0  is  very 
close  to  the  irreducible  phase-correlation  curve.  This  is  because 
events  with  presentation  phase  <p>  0  occur  after  the  prediction  signal 
has  arrived  at  the  arrowheads.  Cutting  the  prediction  signal's  tail 
off  prevents  it  from  correlating  with  x  process  responses  to  events 
presented  with  presentation  phases  greater  than  the  time  interval 
during  which  the  prediction  signal  is  suprathreshold.  In  this  case, 
this  meant  no  correlation  with  x  processes  responding  to  events 
presented  with  presentation  phase  <P>S  6  On  the  other  hand,  the  x 

processes  retained  their  "tails"  because  T  "0,  Thus  the  "tails" 
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of  grid  node  responses  to  events  occuring  before  the  prediction  signal 

arrived  at  the  arrowheads  (  $<  0)  were  available  for  correlation 

This  explains  why  the  phase-correlation  curve  for  c{)<0  in  figure  3»5»3 

is  similar  to  the  phase-correlation  curve  for  an  outstar  without 

thresholds c 

In  addition  to  making  the  phase-correlation  curve  for  an  outstar 

closer  to  the  irreducible  phase-correlation  curve,  thresholds  may  be 

used  for  an  interprative  purpose.  Since  the  prediction  signal 

x  (t  -  X   )  -  r    could  not  effect  the  grid  nodes  until  x  (t  -  f  ) 
L  c  cJ  to  c 

was  suprathreshold »  we  could  follow  the  convention  of  saying  that 
an  x  process  at  a  node  does  not  indicate  a  response  by  that  node  until 
it  is  suprathreshold.  We  could  still  set  TL  =  0  in  equation  3»5»3 
and  place  an  imaginary  threshold  on  the  grid  nodes.  With  this  inter- 
pratative  convention,  we  have  a  concrete  relationship  between  the 
amplitudes  of  the  x  processes  and  the  psychological  idea  of  a  response 
from  a  subject.  Additionally,  the  phenomena  of  a  faded  memory 
popping  up  into  tho  outstar' s  consciousness  during  "pumping  up" 
is  given  a  concrete  interpretation. 
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section  3,6  Other  Input  Pulse  Shapes 

A  short  study  was  made  of  the  effects'  on  a  simple  outstar  of  using 
input  pulses  vrith  shapes  other  than  rectangular.  The  results  were 
that  there  appear  to  be  no  qualitative  differences  in  the  performance 
of  an  outstar  using  any  input  pulse  of  duration  less  than  or  equal  to 
1/oL   •  The  sole  exception  to  this  qualitative  finding  was  that  the 
choice  of  input  pulses  does  affect  the  shape  of  the  phase-correlation 
curveB 

Quantitatively,  the  input  pulse  did  affect  the  maximum  amplitude 
of  the  x  responses.  Additionally,  the  magnitude  of  v  to  meet  a  specific 
well  learning  criteria  was  affectede 

One  important  result  of  this  study  was  that  the  maximum  amplitude 
of  a  grid  node  x  process  responding  to  a  prediction  signal  alone  was 
at  approximately  l/o.  time  units  after  arrival  of  the  prediction  signal 
for  all  input  pulsese  If  we  consider  the  input  apparatus  of  the  outstar 
to  be  a  data  sampler  which  samples  the  environment  at  time  t-  and  de- 
livers appropriate  input  pulses  to  the  outstar' s  nodes,  then  this 
effect  can  be  considered  to  be  an  inherent  time  delay  in  the  outstar' s 
prediction.  That  is,  an  event  which  occurs  in  the  environment  at  time 
t~  is  predicted  by  the  outstar  to  occur  at  time  t  +  l/(X  « 

Figures  3*6,1,  3«662,  and  3»6«3  s^ow  the  results  for  the  pulses 
used  in  this  study.  They  should  be  compared  to  figure  3»5el  which  shows 
similar  results  for  a  rectangular  pulse.  The  irreducible  phase-correlation 
curves  in  these  figures  were  computed  by  analytically  correlating  the 
input  pulses. 
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Figure  3.6.1.   The  response  of  an  outstar  to  a  triangular  input  pulse. 
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Figure  3.6,2.   The  response  of  an  outsiar  to  an  exponential  input  pulse. 
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CHAPTER  4     LATERAL  INHIBITION 

section  4,1     Introduction  to  Lateral  Inhibition 

The  last  chapter  showed  that  a  practical  outstar  (  one  with  a  fast 
forgetting  rate)  had  the  major  drawback  of  cither  being  a  slow 
learner  or  having  very  low  resistance  to  random  mistakes.  This  was 
due  to  its  inability  to  additively  sum  its  past  experience  in  the  z 
processes  because  of  the  large  decay  rate.  In  this  chapter  we  will 
study  a  more  complicated  outstar  which  retains  all  the  desirable 
qualities  of  the  simple  outstar  with  a  fast  forgetting  rate  and  has 
the  further  property  that  it  is  resistant  to  random  mistakes. 

The  additive  summing  of  past  e;rperience  in  the  slowly  forgetting 
outstar  of  chapter  3  resulted  in  good  resistance  to  random  mistakes 
because  this  outstar' s  experience  with  the"  correct  pattern  was  so  great 
that  it  could  absorb  mistakes.  The  opposite  of  this  passive  absorption 
of  mistakes  would  be  to  use  the  past  experience  to  actively  supress  a 
mistake  when  it  occurs.  The  psychological  term  for  active  suppression 
3  s  inhibition.  Figure  4,1,1  shows  the  geometric  schematic  and  the 
equations  for  a  laterally  inhibiting  outstar.  The  equations  governing 
its  performance  are  here  repeated  for  conveniences 

4.1.1  x  (t)  =  -ax  (t)  +  P  (t) 

c        c      c 

4.1.2  x.(t)  =  -ax.(t)  +  P.(t)  +  (3x  (t  -r  )z  ,(t)  - 

1  J         J-  C  CJL 

nrX   [x.(t  -r~)]  +' 

4.1.3  z  .(t)  =  -uz  (t)  -:-  v  Tx  (t  -  r  )x  (t)l  + 

cj-        ci       L  c        a   J 

The  notation  [  yj  '  means  the  maximum  of  the  variable  y,  or  0,  as 
in  the  case  with  thresholds,  A  short  discussion  of  the  significant 
differences  between  equations  4.1  and  those  for  a  simple  outstar  follows, 
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EQUATIONS  GOVERNING    NETWORK  PERFORMANCE 


*.u        xc(th  -axed)  +  R(t) 


N 


H-.LZ         X'L(tJ=-0LXi(i)  f  %(*)   +/3Zei(£)Xc(t-r)  ~/3'?[Xj  U-r'j] 


-\ 


A«- 


H-.13       Zd(i)---azci(.t)  +  vExc(t-r)XLtt)] 
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Figure  4.1.1.   An  outstar  with  lateral  inhibition.  The  double 
lined  directed  edges  transmit  inhibitory  signals.  Only  three 
grid  nodes  are  shown  (N  =3). 


r         1   + 
A  negative  prediction  signal  ->($/!    \x   (t  -T")J   has  been 

added  to  the  equation  for  the  grid  nodes'  x  processes*  This  is  the  net 

inhibitory  signal  sent  to  grid  nodo  i  from  all  the  other  nodes  in  the 

grid.  These  inhibitory  signals  are  sent  along  the  double  lined 

directed  edges  in  figure  if-,1,1.  The  transmission  delay  from  the 

originating  node  to  the  receiving  node  is  T  •  Note  that  a  grid  node 

sends  an  inhibitory  signal  only  if  its  x  process  is  positive.  With 

inhibition,  it  is  possible  for  an  x  process  to  have  negative  amplitudes, 

We  shall  adhere  to  the  convention  of  considering  that  a  node  is 

responding  only  if  its  x  process  is  positive.  Although  we  will  be 

able  to  measure  the  negative  excursions  of  the  x  processes  thoy  shall 

be  considered  equivalent  to  zero  amplitudes  in  the  simple  outstar. 

In  the  simple  outstar,  zero  or  small  amplitudes  were  interpreted 
as  no  response  In  the  laterally  inhibiting  outstar,  negative  amp- 
litudes mean  that  node  is  in  an  inhibited  state.  Using  the  above 
convention  for  interpreting  the  response  of  a  node  implies  that  an 
inhibited  node  is  in  a  super  non  responding  state.  Limiting  a  node's 
ability  to  affect  other  nodes  via  the  inhibitory  signals  to  only 
those  times  when  its  x  process  is  positive  is  consistent  with  the  above 
convention. 

No  learning  occurs  in  the  arrowheads  of  the  inhibitory  directed 
edges.  The  z  process  3.n  those  arrowheads  can  be  considered  to  always 
have  a  value  of  unity. 

Equation  ^,1,3  for  the  z  processes  located  in  the  arrowheads 

of  the  directed  edges  from  the  command  node  is  the  same  as  that  for 

a  sirnp?„e  out  star ,  Again,  a  node's  inhibited  state  is  ignored  by 

the  correlation  driving  function  v  [  x  (t  -2*  )x.(t)J  ',  Thus  tho 
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z  processes  can  only  have  non  negative  values ,  For  this  reason  this 
outstar  is  an  excitory  biased  machine,  We  will  have  occasion  in  a 
later  chapter  to  investigate  outstars  which  allow  negative  z  processes 
and  are  more  neutrally  biased. 

The  rationalo  for  lateral  inhibition  is  to  have  a  responding 
grid  node  inhibit  all  the  other  grid  nodes e  When  several  grid  nodes 
are  responding  at  the  same  time,  we  expect  the  node  responding  with 
the  greatest  amplitude  to  inhibit  the  other  nodes  the  most  while 
suffering  the  least  inhibition  itself.  When  a  random  mistake  occurs 
in  a  previously  learned  pattern,  the  prediction  signal  inputs  to  the 
grid  nodes  will  causa  the  nodes  corresponding  to  events  in  the  pattern 
to  respond  with  greater  amplitude  than  the  nodes  corresponding  to  the 
mistake,  This  will  result  in  inhibition  of  the  response  to  the  mistake. 
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section  l\c2  Experimental  Study  of  an  Oat  star  with  Lateral 

Inhibit  ion  j 

To  test  the  claim  that  the  laterally  inhibiting  outs-tar  has  good 
noise  resistances,  we  shall  repeat  experiment  II  which  was  performed 
with  the  simple  cut star.  All  the  parameter  specifications  for  that 
experiment  will  be  retained.  However f  vie  have  two  new  parameters  to 
specify,  (f    and  T~  . 

If  it  takes  too  long  for  inhibitory  signals  to  travel  along  their 
directed  edges,  then  we  shall  have  defeated  the  purpose  of  lateral 
inhibition  by  having  inhibiting  signals  arrive  aftor  the  damage  has 
been  done.  Thus  Z"  should  bo  small .  Lateral  inhibition  would  be 
most  effective  if  t~  ~  0,  but  we  shall  observe  the  constraints  on 
transmissions  along  directed  edges  set  up  in  chapter  1.  With  these 
arguments  in  mind,  t~  is  selected  to  be: 

T  =  ±_      =  J_6  -  T 
3a     3     3 

A  rational  guess  for  /3~is  difficult.  In  order  to  specify  it  most 

efficiently  we  would  need  some  idea  of  the  average  number  of  grid  events 

composing  a  pattern  and  the  average  number  of  events  that  compose 

a  random  mistake.  The  reason  for  desiring  this  information  when 

selecting  A~   is  obvious  ;  Suppose  that  we  had  two  patterns  we  wished 

to  teach  to  two  outstars  sharing  the  same  grid.  Pattern  6*    is  composed 

of  one  event c  Pattern  0  9   is  composed  of  n  events  whore  i  <  n  <  N 

and  N  is  the  number  of  grid  nodes.  Then  the  node  corresponding  to  the 

event  in  pattern  0.   will  not  be  inhibited  at  all.  However,  each  of 

—a. 

tho  nodes  corresponding  to  events  in  9^   will  inhibit  each  other  and 

the  node  responses  to  ~Q     will  have  a  diminished  amplitude.  Thus  the 
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z   correlations  for  0     will  ba  smaller  and  it  will  require  many  more 
instructions  to  learn  ©2  than  it  will  require  to  learn  q  .  «  Any 
selection  for  (f  will  work  well  in  learning  6  *  t     On  the  other  hand 
an  excessively  large  ff  will  result  in  very  inefficient  learning  of 

e2. 

HoT^ever,  we  want  a   large  enough  to  inhibit  random  mistakes. 
Thus  we  are  faced  with  a  trade  off  between  inefficient  learning  and 
the  proper  degree  of  inhibition  to  counter  mistakose  A  fore-knowledge 
of  the  average  situation  to  expect  would  greatly  aide  in  the  proper 
selection  of  ff  ,  Of  course  p  if  we  wanted  our  out  stars  to  be 
completely  unbiased  at  the  beginning  of  the  experiment,  we  could  make 
a  large  number  of  them  with  various  ff  and  turn  them  lose  in  the 
environment •  Survival  of  the  fittest  would  soon  select  the  optimal, 

p-. 

For  the  purposes  of  this  study,  it  was  decided  to  select  ff  on  the 
idea  that  at  most  two  events  would  compose  a  pattern  and  a  random 
mistake  on  the  average  would  consist  of  one  event,  /3~was  chosen  to 
allow  the  inhibitory  signal  from  excited  nodes  to  drive  an  unexcited 
node  to  approximately  one-half  the  amplitude  of  the  excited  node, 
A  brief  analysis  was  made  to  meet  this  criteria  as  follows  j 

Maximum  amplitude  of  an  x  process  excited  by  a  rectangular  pulse 
of  amplitude  A  and  duration  6  -  i/d     was  Biax(xi(t))  =  (A/a  )(1  -  e"  )  = 
0,63  A/a. 

Amplitude  of  such  an  input  resulting  in  (l/2)max(xj,(t))  is  (i/2)A, 
/i"(0.63)(A/a  )  =  (1/2)  A 

or  ff   =  « /I. 26 

for  a  =  3.333,  ft  ^   2.64 
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An  experimental  check  of  this  resulted  in  ft    "2.38.  The  11$ 
error  is  due  both  to  the  naivete  of  the  analysis  and  errors  inherent 
to  the  digital  simulation* 

Inadvertently,  v  was  changed  to  2,*4-  resulting  in  a  well  learning 
in  one  and  one-half  presentations  criteria.  This  minor  descrepeney 
is  hot  sufficient  to  prevent  comparison  with  experiment  II,  For 
convenience  the  major  parameters  used  are  listed  here: 
Network  parameters: 

a  =  3.3333  sec."1  =  i/S 

/3  -  i.o 

—1 

u  =  0,55^  sec," 
v  =  Z.k 

T  "   0,3  sec, 
?  -   0,1  sec. 

/*"■  2.38 
Input  pulse  parameters: 

A  «  10 

S  =  0,3  sec,  =  l/a 

The  equations  governing  the  performance  of  the  laterally  inhibiting 

out star  ares 

x  (t)  =  -ax  (t)  +  P  (t) 
c         c       c  N  + 

^(t)  -  -ocx^t)  +  r^Ct)  -5-/3  z ci(t)x  (t  -r  )  -/3"2[x,(t  -r~)] 


2  .(t)  =  -uz  .(t)  +  vfxr,(t  -r  )x.(t)l';"       j*i 

CX  CI         L  C  X    J 


Experiment  III  was  begun  by  teaching  the  out star  the  pattern 

V  -c»-Y  by  two  presentations  of  event  2,  "X   time  units  after  presentation 

of  the  command  event.  Figure  ^.2.1  shows  the  result,  Thr  prediction 
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response  on  "the  x  (t)  trace  shows  that  V  — *-V  -was  well  learned  as 

is  -indicated  bv  the  z  At)   trace.  Figure  4,2,1  should  be  compared 

c2 

with  figure  3«>3ol  which  shows  the  rosult  of  the  same  pattern  being 

taught  to  a  practical  simple  out star. 

Of  interest  in  figure  4,2,1  is  the  fact  that  a  minor  association 

of  V  with  V  did  not  occur  to  any  significant  extent.  Event  1  was 
c      3 

presented  with  presentation  phase  <jP  =  0  with  respect  to  arrival  of 
the  prediction  signal  at  the  arrowheads.  That  is,  event  1  was  pre- 
sented at  the  exact  time  instant  that  the  prediction  signal  arrived 
at  the  arrowheads.  In  the  discussion  of  the  phase-correlation  curves 
of  section  3»5»  we  saw  that  presenting  an  event  with  presentation 
phase  (D  -  0   results  in  the  greatest  increase  in  the  amplitudes  of 
the  z   process.  In  this  sense ,  an  event  presented  with  presentation 
phase  <p  =  0  is  learned  best.  Events  presented  with  presentation  phase 

<P  fi  0   are  learned  to  a  lesser  extent.  In  figure  4,201,  event  3 
was  presented  0,6  seconds  after  event  1,  That  is,  event  3  was  pre- 
sented with  presentation  phase  y  -  *H),6  seconds  =  2S  -  z/oL  ,  From 
the  phase-correlation  curve  for  a  simple  outstar  without  thresholds 
in  section  3«5*  we  saw  that  presenting  an  event  with  presentation 
phase  <p  =  +2  8  ~  +2/cL    resulted  in  a  significant  increase  in  the 
associated  z  process* s  amplitude.  The  addition  of  thresholds  to  the 
simple  outstar  prevented  any  increase  in  the.  associated  z  process  by 
cutting  off  the  "tails"  of  the  x  processes. 

The  laterally  inhibiting  outstar  corrontly  under  study  does  not 
have  thresholds.  However,  from  the  fact  that  only  a  very  minor  associ- 
ation V  — ^V«j  was  learned  in  figure  4,201,  it  appears  that  lateral 
c    j 

inhibition  has  some  of  the  same  effects  that  thresholds  have  on  the 
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performance  of  an  out star.  We  will  investigate  this  performance  in 

detail  in  section  4.4, 

Experiment  III  was  continued  to  check  the  claim  that  lateral 

inhibition  increases  resistance  to  random  mistakes.  Figure  4,2,2 

shows  the  result  of  presenting  the  previously  learned  pattern  ^  r-*-V2 

with  a  simulated  random  mistake,  event  1.  As  the  x^Ct)  and  z  .  (t) 

1        cl 

traces  show,  the  mistake  was  inhibited  to  the  point  where  V  — e-V. 

was  learned  to  only  a  minor  extent.  (Compare  to  figure  3»2.3e) 

Additionally  f  predictions  following  the  mistake  presentation  resulted 

in  the  x.  (t)  process  being  totally  inhibited.  This  resulted  in  the 

memory  of  the  mistake  decaying  towards  extinctions,  as  shown  by  the 

z  .  (t)  trace.  Of  course  the  dramatic  results  shown  in  figure  4.2,2 
cl 

were  due  to  the  comparative  freshness  of  the  pattern  V  — *-  V  in  the 

outstar's  memory  as  shown  by  the  large  amplitude  for  z  ?(t)  when  the 

mistake  was  presented.  The  memory  of  V — e-V  will  fade  as  z  r(t) 

c    c.  c2 

decays.  If  the  memory  is  sufficiently  faded,  wo  will  not  expect  the 
resistance  to  random  mistakes  to  be  as  good. 

This  has  parallels  in  every  day  experience.  Students  are  less 
3-ikely  to  be  deceived  by  a  tricky  question  in  an  examination  when  the 
subject  matter  is  fresh  in  their  minds. 

Lateral  inhibition  does  not  prevent  the  outstar  from  correcting 
a  previously  learned  pattern  which  is  in  error.  Experiment  III  was 
continued  to  convert  the  previously  learned  pattern  V  — **-  V  with  a 

C     c, 

new  pattern  V  — *-V  ,  Figure  4.2.3  shows  the  results.  As  can  be  seen, 
c    l 

two  presentations  of  the  new  pattern  were  sufficient  to  totally 
inhibit  the  old  pattern  and  insure  extinction  of  its  memory. 


88 


A=IO 


HSh- 


P,(t) 


Xc(t) 


ikKL.  kKj   kKj 


=ItI 


10  - 


P,(t) 


LjL2. t^_ I 


0 


T 


10 


10 


P2(t)     x2(t) 


P,  (t)      X  Jt) 


io-     Zc2(t) 


^^S^SfozZsLzzar.L^—I 


C 

OJ 

P 
P 

P. 

-o 

« 


•H 
> 

<D 

WD 

c 

•H 

P 
o 
o 

o 


-4- 
D 

•H 


C 

o 

p 
o 
c 

•H 
P 

X 

0) 

<y 
_c 
p 

o 
p 
o 


ct 

p 
en 
p 
pi 
c 

to 
c 

■H 
P 
•H 
X> 
•H 
X 

c 

•H 

iH 
^ 

P 

CC 


P 


cv 


Cm 
o 


a) 
p 


p 

CO 


(1) 
p 

•H 
X> 
•H 

.5 


P 

O 
p 


P 


10- 

.    L 


0 


zc3(t) 


2      3      4       5 

TIME  (sees) 


8<\ 


section  4,3    Advantage  of  Correcting  a  Learned  Mistake  with 
Lateral  Inhibition 

In  section  3c^  we  discussed  the  effocts  of  the  forgetting  rate 
u  on  a  simple  out  star's  resistance  to  random  mistakes  and  on  its  abili- 
ty to  correct  learned  mistakes*  The  conclusion  was  that  a  small  u 
resulted  in  good  random  mistake  resistance,  but  very  low  correctability. 
A  large  u  had  the  opposite  effect.  From  the  outstar' s  point  of  view, 
the  only  difference  between  a  random  mistake  and  a  correction  to  a 
previously  learned  pattern  is  that  the  random  mistake  occurs  infre- 
quently with  the  command  event  whereas  the  correcting  pattern  usually 
occurs  with  the  command  event.  It  was  sho-wn  in  section  3»4  that  the 
outstar  remembered  the  difference  between  an  event  which  infrequently 
occurs  with  the  command  event  and  one  which  usually  occurs  with  the 
command  event  in  the  accumulated  past  experience  contained  in  the 
amplitude  of  its  z  processes .  With  a  small  u  the  past  experience 
was  not  forgotten  rapidly  and  resulted  in  a  great  accumulation  of 
experiencee  It  was  net  surprising  that  an  infrequent  variation  in  the 
pattern  had  a  small  effect  on  the  accumulated  experience.  On  the  other 
hand,  a  great  accumulation  of  past  experience  with  a  pattern  makes  it 
very  difficult  to  convince  the  outstar  that  the  pattern  was  an  error. 
Due  to  the  fast  rate  of  forgetting  past  experience  in  the  large  u 
outstar,  little  accumulation  of  experience  occured  resulting  in  its 
random  mistake  resistance  and  correctability  properties.  Thus  by 
interpret ing  the  amplitudes  of  the  z  processes  as  accumulated  past 
experience  it  seemed  very  reasonable  to  conclude  that  good  random 
mistake  resistance  and  correctability  were  incompatible, 

<?0 


Figure  *:e2.2  and  *K2.3  show  that  this  need  not  be  the  ease. 
The  laterally  inhibiting  outstar  lias  both  good  resistance  to 
random  mistakes  and  good  corroctability.  Lateral  inhibition  was 
introduced  to  make  an  outstar  with  a  fast  forgetting  rate  more 
resistant  to  random  mistakes.  It  might  have  been  expected  that  this 
would  decrease  its  correctability.  We  shall  inquire  why  it  did  not. 

In  the  slowly  forgetting  simple  outstar  the  only  way  a  pattern 

can  be  corrected  is  by  brute  force.  The  amplitude  of  grid  node  x 

process  responses  is  a  linear  function  of  the  sum  of  the  event  input 

pulses  and  prediction  signal  inputs: 

x,(t)  =  -ax.(t)  +  ftz   .(t)x  (t  -T  )  +  P.(t) 
i        i      i  ca.    c  i 

Thus  the  amplitude  of  a  grid  node  x  process  response  is  greater  when 
there  is  an  event  input  pulse  than  when  there  is  only  a  prediction 
signal  input  alone.  Therefore  the  correlating  signal  vx  (t  -  T  )x_.  (t) 
is  greater  when  there  is  an  event  input  pulse  and  the  z   process  grows 
faster.  In  correcting  a  pattern  in  a  slowly  forgetting  simple  outstar, 
we  simply  stop  presenting  the  events  of  the  erroneous  pattern  and 
start  presenting  the  events  of  the  correcting  pattern.  As  was  shown 
in  figure  3<>2,2f  the  additional  amplitude  of  the  grid  node  x  processes 
due  to  the  correcting  event  input  pulses  results  in  the  z   processes 
associated  with  the  correcting  pattern  events  growing  faster  than  the 
z   process  associated  with  the  erroneous  pattern.  By  the  outstar  theorem 
we  are  assured  that  eventually  the  probabilities  X.(t)  and  y.(t)  vnll 
go  from  values  describing  the  erroneous  pattern  to  values  describing 
the  correcting  pattern.  However,  we  have  seen  that  in  the  slowly 
forgetting  outstar,  the  x  process  amplitudes  will  have  become  imprac- 
tically  large  long  before  this  happens. 


In  the  rapidly  forgetting  out star,  we  do  not  have  the  problem 
of  Siapractically  large  amplitude  x  processes.  Further,  in  trying 
to  correct  a  previously  learned  pattern  we  are  aided  by  the  rapid 
forgetting  rate.  In  addition  to  the  effects  of  the  brute  force 
correcting  process,  the  rapidly  forgetting  outstar  forgets  the  erroneous 
pattern  while  it  is  learning  the  correcting  pattern.  (Provided  of 
course,  the  excitations  of  the  command  node  are  spaced  far  enough 
apart  not  to  result  in  significant  pumping  up  of  the  erroneous  pattern.) 
Thus  in  addition  to  the  active  process  of  forcing  the  z  process  associ- 
ated with  the  correcting  pattern  to  grow  larger  than  those  associated 
with  the  erroneous  pattern,  there  is  the  passive  process  of  forgetting 
the  old  pattern.  As  has  been  emphasized  this  passive  forgetting 
process  results  in  the  better  correctability  of  the  rapidly 
forgetting  sample  outstar  as  well  as  its  low  resistance  to  random 
mistakes. 

In  the  laterally  inhibiting  outstar,  we  retained  the  fast 

forgetting  rate  to  control  grid  node  x  process  amplitudes.  Thus  we 

have  both  the  active  brute  force  correcting  process  and  the  passive 

forgetting  process  working  to  correct  an  erroneous  pattern.  If  we 

look  closely  at  figures  4.2.2  and  4.2. 3i  wo  can  S6e  tho  effect  of 

lateral  inhibition  in  both  random  mistake  correction  and  pattern 

correction.  In  figure  4.2.2,  presentation  of  the  previously  learned 

pattern  V — o-Y.  with  the  simulated  random  mistake  V—^V.  restated  in 
c    2  c    J- 

growth  of  both  z  ,(t)  and  z  „(t)c  However,  the  sum  of  the  input  pulso 
b  c2        cl 

P2(t)  and  the  input  prediction  signal  fb  zc2(t)xc(t  -  r  )  drove  x^t) 
to  a  greater  amplitude  than  x-(t)  was  driven  by  P^('t)  alone. 
Therefore  x,(t)  was  diminished  by  the  inhibiting  signal  from  ^(t) 


?2 


and  z  (t)  did  not  grow  to  a  very  large  amplitude,,  Both  z  (t)  and 
el  cl 

z  (t)   decayed.  On  subsequent  predictions  the  prediction  input  signal 

for  V  was  not  sufficient  to  overcome  the  inhibitory  signal  from  V^ 
1  c 

and  the  correlating  signal  v[x  (t  -  f  )x..(t)j  was  zero.  Thus  z  ^(t) 

was  unable  to  grow  on  subsequent  predictions  and  the  fast  forgetting 

rate  insured  that  the  random  mistake  would  be  totally  forgotten 8 

We  have  said  that  a  random  mistake  occurs  infrequently e  Thus 

we  can  expect  that  the  fast  forgetting  rate  will  insure  that  the  random 

mistake  will  be  forgotten  before  it  occurs  again  during  presentation 

of  the  pattern  and  there  will  bo  no  accumulation  of  experience  with  the 

mistake  in  z  ,  (t).  Now,  look  at  the  successful  correction  of  the 
cl 

previously  learned  pattern  V— *=-  V   with  V  — *■  V  in  figure  ^.2,3» 
It  is  seen  that  on  the  first  presentation  of  the  correcting  pattern, 
the  accumulated  experience  with  the  erroneous  pattern  was  still  great- 
er than  the  experience  accumulated  on  the  first  presentation  of  the 
correcting  pattern*  At  this  point  the  outstar  could  not  be  aware 
that  V-— r--Vj  is  a  correcting  pattern  and  not  a  random  mis-bake.  However, 
on  the  next  presentation  of  V— -^  V* ,  the  experience  now  accumulated 

with  V  — ^V  ,  coupled  with  the  event  input  pulse,  is  sufficient  to 
o    l 

drive  x^ (t)  to  a  greater  amplitude  than  prediction  alone  can  drive 

x  (t).  Consequently,  x  (t)  is  inhibited  and  z  (t)  grows  very  little,, 
£  2  .        c2 

The  fast  forgetting  rate  now  insures  that  z  At)   will  decay  to  a  point 
where  a  third  presentation  of  the  correcting  pattern  will  completely 
inhibit  prediction  of  V-j-**  V?  afid  it  is  impossible  thereafter  to 
accumulate  any  more  experience  with  V — »-  V  by  prediction.  At  this 
point  we  can'  ray  that  the  pattern  has  been  corrected. 

n 


It  is  a  combination  of  brute  force  correcting  resulting  in  ac- 
cumulation of  experience  with  the  correcting  pattern,  rapid  forgetting 
of  the  erroneous  pattern",  and  use  of  accumulating  experience  to 
inhibit  the  erroneous  pattern  which  accounts  for  the  eorrectability 
property  of  a  laterally  inhibiting  outstar.  The  same  combination 
of  processes  results  in  its  random  mistake  resistance,,  It  is  the 
inability  of  a  simple  outstar  to  couple  accumulation  of  experience 
with  forgetting  that  results  in  the  incompatibility  of  random  mistake 
resistance  with  eorrectability. 

Because  of  the  inability  to  control  the  amplitudes  of  grid  node 
responses  with  small  u's  we  will  not  undertake  to  study  the  variation 
of  these  properties  in  a  laterally  inhibiting  outstar  with  a  fast 
forgetting  rate.  In  chapter  six,  we  will  present  a  different  formu- 
lation of  the  outstar-  equations  which  control  the  amplitudes  of  the 
grid  node  responses  independent  of  the  amplitudes  of  the  z   processes^ 
and  incorporate  a  form  of  lateral  inhibitions  At  that  time  we  will 
consider  the  effect  of  decreasing  the  forgetting  rate  on  the  properties 
of  a  laterally  inhibiting  outstar* 


n 


section  ^»4     Further  Remarks  on  Local  Lateral  Inhibition 

In  the  first  part  of  experiment  III,  figure  402,i,  it  was  noted 

that  lateral  inhibition  appears  to  have  some  of  the  same  effects 

that  thresholds  have  on  the  performance  of  out stars.  The  evidence 

was  that  presentation  of  event  3»  0e6  seconds  =  2  S     "  Z/  d    ,  after 

arrival  of  the  prediction  signal  at  the  arrowheads  did  not  result  in 

any  learning  of  V  -*-  V  ,  Further  investigation  shows  that  this  result 

c    j 

is  of  dubious  value. 

The  x  (t)  trace  in  figure  ^-.2.3  shows  the  inhibitory  response  of 
a  node  to  a  single  input  pul.se  at  another  node  in  the  grid.  The 
maximum  of  this  inhibitory  response  occurs  approximately  2  5  =  Z/cl 
time  units  after  arrival  of  the  inhibit ory  signal  at  the  node.  Thus 
the  maximum  inhibitory  response  occurs  at  approximately  T"+  Z/OL  time 
units  after  beginning  excitement  of  the  other  nodee  The  result  is 
maximum  inhibition  of  events  presented  a  little  less  than  X "  +  2  5 
after  beginning  excitation  of  a  grid  ncde0  Now,  if  an  event  has  been 
presented  X~*   2  5  before  arrival  of  the  prediction  signal,  the  event 
presented  with  ^  =  0  presentation  phase  relative  to  the  arrival  of  the 
prediction  signal  at  the  arrowheads  would  have  been  most  inhibited  and 
little  learning  of  this  event  x-rould  have  resulted.  In  effect,  this 
means  that  to  avoid  inhibiting  an  event  to  bo  associated  with  the 
command  event  of  one  outstar  sharing  the  grid  with  other  outstars, 
the  interval  between  event  presentations  must  be  greater  than  approx- 
imately X"  +  h/a.    time  units. 

The  reason  for  the  maximum  of  an  inhibitory  response  occuring 
so  long  after  excitation  of  a  node  can  be  seen   analytically.  If  the 


total  input  signal,  l.(t).  to  an  embedding  field  network  node,  V. , 

is  a  linear,  time  Invariant  function  of  tine,  then  the  node's  x  process 

has  a  transfer  function  l/(s  •*•&),  such  that: 

x  (s)  =  (l.(s))/(s  *a). 
i       a 

A  cascade  of  n  nodes  has  a  transfer  function  of  (l/(s  '!-<l))n.  Due 
to  the  short  duration  of  our  input  pulses,  we  are  dealing  essentially 
with  the  transient  response  of  the  x  process.  Thus  the  transform  of 
i/(s  *a)n  is  a  good  indication  of  what  our  pulse  should  look  like 
after  having  traveled  through  n  nodes, 

W^y*  **  j^tnJ^"1  Ve'^J  =  (n^iirft^1  e~at )« x  (t) 

;  J       n 

The  maximum  of  this  occurs  at  J 

<bc  (t) 

dt      *=  0 

(n-l)lvt    o  ) 
-  Thus  the  more  nodes  a  pulse  travels  through,  the  later  its  maximum 
occurs.  Of  course 9   the  input  signal  to  a  grid  node  in  a  laterally 
inhibiting  out-star  is  partially  non  linear.  However,  if  we  consider 
that  the  z  process  vary  slowly  enough  so  that  we  can  consider  them  to 
be  approximately  constant,  then  the  above  analysis  approximately  holds 0 
Thus  in  the  case  where  an  input  is  given  to  one  node  which  inhibits 
another,  we  have  a  n  =  2  node  cascade,  and: 

x2(t)  =  He"at 

with  maximum  at: 

t  -  l/a   after  arrival  of  the  inhibitory  signal. 
Now,  if  we  add  a  prediction  signal  from  the  command  node  also,  we  have 
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a  n  "  3  node  cascade  and? 

x3(t)  -  +(l/2)t2e-ttt 

with  maximum  at  ; 

t  =  2/oL        after  arrival  of  the  inhibitory  signal. 

Thus  the  occurance  of  the  maximum  inhibitory  response  between 
X "+  l/oc  and  X  -J-  2/0L  is  inherent  to  the  network  according  to  the 
approximate  analysis.  The  experimental  evidence  shows  that  this 
approximate  analysis  is  reasonably  correct.  We  will  have  further 
occasion  to  consider  this  "lengthening"  of  pulses  as  they  go  through 
successive  nodes  when  we  study  out star  avalanches. 

The  earlier  prediction  that  a  ft   suitable  for  learning  a  pattern 
of  one  event  results  in  inefficient  learning  of  patterns  with  more  than 
one  event  was  tested,  A  N  =  h   grid  node  laterally  inhibiting  out star 
was  used,  v  =  3«2  was  selected  to  result  in  well  learning  of  one 
event  in  one  presentation  and  this  was  experimentally  verified.  All' 
other  parameters  were  the  same  as  in  experiment  III,  The  initial 
conditions  on  the  z  processes  were  reset  to  zero.  Three  events  were 
presented  to  the  grid  X   time  units  after  excitation  of  the  command 
nodec  A  prediction  was  requested  l/u  time  units  later.  The  results 
are  shown  in  figure  ^.3*1*  The  pattern  V  — "(V^,  Vp,  V,)  was  learned 
very  poorly.  From  this  evidence  it  can  be  concluded  that  it  would 
require  many  more  rapid  presentations  of  this  pattern  to  result  in 
well  learning. 

Of  course  with  lateral  inhibition  any  /3">0  will  result  in  faster 
learning  of  a  pattern  with  fewer  events.  If  we  consider  the  number  of 
elemental  events  as  a  measure  of  the  complexity  of  a  pattern,  then  this 
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effect  translates  into  the  statement  that  a  complicated  pattern 

is  harder  to  learn.  A  laterally  inhibiting  out star  has  some  of  the 

same  drax^backs  as  the  human  mental  process. 
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CHAPTER  5     THE  OUTSTAR  AVALANCHE 
section  5*i     Introduction 

In  section  2ei  the  out  star  avalanche  ras  briefly  introduced. 
Its  geometric  schematic  and  equation  were  shown  in  figure  29i.2 
which  is  here  repeated  for  convenience*  The  basic  idea  behind  the 
avalanche  is  to  arrange  the  command  nodes  of  many  out stars  in  a 
linear  cascade.  Excitement  of  the  first  node  in  the  cascade  results 
in  a  prediction  signal  arriving  at  the  jth  command  node  of  the  cascade 
jT  time  units  later*  Thus  each  outstar  in  the  avalanche  takes  a 
picture  of  the  time  varying  pattern  on  the  grid  at  integer  multiples 
of  T  .  The  result  is  that  the  avalanche  can  learn  and  reproduce  a 
sampled  data  approximation  of  a  time  varying  pattern  of  events,,  The 
starting  command  node  in  the  cascade  represents  an  event  which  is 
associated  with  the  start  of  the  time  varying  pattern. 

The  linear  command  node  cascade  essentially  acts  as  a  clock  to 
determine  when  the  data  samples  are  taken.  In  order  to  perform  the 
function  we  would  want  thr  response  of  each  node  in  the  cascade  to 
the  prediction  signal  from  the  node  immediately  before  it  to  be 
approximately  the  same  as  every  other  node.  This  is,  however,  not 
the  case  with  the  outstar  avalanche  arrangement  shown  in  figure  2, 1.1. 
The  reason  was  discussed  in  section  kek  where  we  noticed  that  the 
response  of  nodes  in  a  cascade  got  longer  the  more  nodes  a  signal 
passed  through.  Based  on  the  transient  response  of  such  a  linear 
cascade,  we  analytically  computed  that  the  maximum  of  the  nth  node's 
response  occured  at  (n  -  1)1  ol    ,  A  short  experiment  was  conducted  to 
test  this  result.  Figure  5oiol  shows  a  linear  cascade  of  four  nodes 
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EQUATIONS  GOVERNING  NE'IY.ORK  PERFORMANCE 


2.1.4    xcl(t)  = 


ax  At)  +  P  ,(t) 
cl  cl 


2.1.5     xci(t)  = 


cix   .(t)   +    ax   .    _(t  -  X) 
ci  I      ci-1 


for  1<  if  M 


2.1.6     Xj(t)     = 


ax.(t)  ♦     /3  2:z  .  .(t)x  .(t  -r)  +  p.Ct) 

for  1  £  j -  N 


2.1.7     z  J      (t)  =  -    uz         (t)  +  vx   .(t  -?)x  (t) 
ci,0  ^     ci,j  ci  j 


Figure  2.1,2.      An  outstar  avalanche  and   the   equations 
governing  its   performance. 
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excited  by  a  rectangular  pulse  at  node  V. ,  As  can  be  seen  from  the 

traces  x  (t)  through  x  ,  (t),  the  responses  did  lengthen  by  approx- 
ci  c^ 

imatoly  (n  -  1)/  OL     .  The  equations  used  in  this  experiment  were; 

xAt)   =  -ax  Ct)   +  P  At) 
cl         cl      c 

x  (t)  =  -ax  .(t)  +  fix..   At- T)         fori-  2 
ci         ca      '  ^2.-1 

The  growing  amplitude  of  successive  node  responses  in  figure 

5.1.1  is  due  to  the  fact  that  a  was  selected  to  result  in  the  x  ^(t) 

'  c2 

response  being  of  approximately  the  same  maximum  amplitude  as  the 
x  ..(t)  response.  For  the  parameter  selection  shown  in  figure  5.1  »lf 
this  resulted  in  a  P>  of : 

P*      " 


(1  -  e"1) 


However,  the  steady  state  response  of  a  node  with  transfer  function 
l/(s  +  a)  to  a  step  input  is  to  amplify  the  step's  amplitude  by  l/oc  , 
Thus,  in  order  to  maintain  approximately  equal  amplitude  responses  in" 
a  cascade,   should  be  selected  to  be 

/3  =  OC 
The  fl  in  the  experiment  shown  in  figure  5*1.1  was  too  large  and  re- 
sulted in  the  amplitude  growth  shown. 

The  inadvertant  amplitude  growth  in  figure  5.1.1  does  not  detract 
from  the  basic  result,  A  linear  cascade  of  command  nodes  for  an  ava- 
lanche is  \m satisfactory  due  to  tho  progressive  lengthening  of  command 
node  responses.  In  fact,  this  effect  renders  a  complex  network  of 
embedding  field  elements  requiring  transmission  of  signals  through 
many  nodes  rather  .impractical.  In  a  later  chapter  we  shall  address 
this  problem-  directly,  but  for  the  time  being  we  shall  side  step  it 

by  introducing  a  differently  configured  avalanche. 
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Figure  5.1  »2  shows  an  avalanche  which  perforins  the  same  theore- 
tical function  as  that  pictured  in  figure  28i.2  without  the  pulse 
lengthening  effects.  The  neurophysiological  names  given  to  the  new 
elements  of  figure  5.1.2  were  suggested  by  the  geometric  arrangement 
of  the  nervous  system  in  the  cerebellum  of  vertebrates.  The  long 
axon  is  a  long  directed  edge.  At  periodic  points  along  the  long  axon, 
the  directed  edge  splits  into  a  continuation  of  the  long  axon  and  a 
group  of  N  branches  of  the  directed  edge  called  a  collateral  group. 
Each  of  the  collaterals  has  an  arrowhead  Impinging  on  a  grid  node. 
The  distance  from  the  starting  command  node,  V  ,  to  the  arrowheads 
of  the  jth  collateral  group  are  so  arranged  that  the  time  elapsed  from 
excitement  of  the  starting  command  node  to  arrival  of  the  prediction 
signal  at  these  arrowheads  is  jT  time  units.  In  each  collateral 
arrowhead  is  located  a  z  process  for  correlating  the  prediction  signal 
x  (t  -  j  X   )  with  the  grid  node  responses.  This  long  axon  and 
collateral  geometry  performs  the  clock  function  of  the  avalanche. 

For  ease  of  reference,  the  equations  for  this  avalanche  are 
given  here: 

5.1.1  x  (t)  -  -ax  (t)  +P  (t) 

c         c      c        * 

5.1.2  x.(t)  =  -«x.(t)  +  P.(t)  +  /3  2z..(t)x  (t  -  jT) 

1        *      i      '  j--i  ji    c 

5.1.3  z„(t)  =  ~uz„(t)  +  vx.(t)x At  -  j?  ) 

Equation  5.1.2  is  for  the  response  of  a  grid  node  in  a  simple  outstar. 
We  will  perform  a  simple  experiment  on  an  avalanche  with  this  form- 
ulation and  then  change  equation  5.1.2  to  incorporate  lateral  inhibi- 
tion in  our  avalanche.  The  two  avalanches  thus  formed  will  be  called 
a  simple  avalanche  and  a  laterally  inhibiting  avalanche,  - 
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Time  does  not  permit  an  exhaustive  study  of  avalanehes.  This 
chapter  on  avalanches  is  an  illustration  of  the  results  and  problems 
of  using  the  outstars  studied  previously  in  an  avalanche. 
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section  5*2     A  Simple  Avalanche 

In  this  section  we  will  use  a  simple  avalanche  to  learn  a  time 
varying  pattern  of  events.  In  designing  a  simple  avalanche  to  do 
this,  we  must  first  ask  what  sort  of  time  varying  pattern  are  we 
going  to  have  it  learn.  If  we  have  M  collateral  groups  in  our 
avalanche  and  N  grid  nodes,  we  must  keep  track  of  M  x  N  z   processes 
during  the  experiment *  To  conserve  computation  time,  M  x  N  should 
be  small.  An  avalanche  with  M  =  3  collateral  groups  and  N  =  2  grid 
nodes  is  chosen.  It  would  be  rather  unrealistic  to  expect  an  avalanche 
which  takes  only  three  sample  data  points  to  approximate  a  continuous 
time  varying  pattern.  Thus  we  will  try  to  learn  a  series  of  time 
discrete  events.  That  is,  we  allow  the  possibility  of  the  oceurance 
of  the  two  events  associated  with  the  grid  nodes  in  the  environment. 
We  assume  that  the  events  represent  time  discrete  events  such  as  de- 
pressing the  key  of  a  piano „  We  further  assume  that  there  is  a  min- 
imum time  between  oceurance  of  separate  patterns  of  these  events  and 
we  synchronize  the  avalanche's  sampling  interval  f  with  this  minimum 
interval,  To  simplify  the  experiment  still  further,  we  shall  indicate 
the  oceurance  of  these  events  with  equal  amplitude  rectangular  input 
pulses  to  the  appropriate  grid  node  and  follow  the  convention  of  the 
past  chapters  by  making  the  pulse  duration  8  equal  to  the  rise  time 
of  a  node's  responses 

S  =  I/O. 

With  this  specification  of  the  allowable  input  patterns,  we  have 
made  the  results  of  the  previous  chapters  applicable  to  the  avalanche. 
The  other  parameters  wiJJL  be  specified  accordingly: 
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Figure  5.2.1.   Results  of  an  experiment  with  a  simple  outstar  avalanche. 
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*=  3.3333  sec"    . 

(3=1 

v  -  1,6  (two  presentations'  for  well  learning  criteria) 

A  =  10 
S  =  i/cx      =  0.3  sec. 

We  want  X    to  be  large  enough  to  avoid  significant  over  lapping 
of  the  "pictures"  taken  by  each  collateral  group.  From  the  phase- 
correlation  curves  of  section  3#5t   f  =  3/a  ~  38  t   should  work. 
Thus  t   is  selected  to  be: 
r  =  3/cx  =  0.9  sec. 

The  memory  decay  time  i/u  is  specified  to  be  the  time  between 
successive  presentations  and/or  predictions  of  the  pattern.  Thus: 

u  =  1/4  sec.  =  0.25  sec. 

Figure  5*2.1  shows  the  pattern  presented  to  the  avalanche  and  the 
results.  The  pattern  was  presented  twice.  Symbolically,  the  pattern 
presented  was: 

v~(v  V'  (vi' 0)f  (v2' 0) 

The  grid  node  responses  following  t  =  8,8  seconds  are  the  avalanche's 
learned  prediction  of  the  pattern  ellicited  by  the  excitement  of  the 
starting  command  node  alone  at  t  =  7,9  seconds. 

As  can  be  seen,  the  avalanche's  prediction  is  not  an  unqualified 
success.  Of  course  a  is  too  snail  to  approximate  the  input  pulses 
with  any  degree  of  accuracy.  Nonetheless,  grid  node  V.  did  respond 
with  two  large  amplitude  responses  in  a  row  and  grid  node  V  responded 
with  large  responses  spaced  ZX   apart  as  in  the  input  pattern.  However, 
the  third  response  of  x.(t)  and  the  second  response  of  x  (t)  show 

/O? 


that  the  avalanche  has  noticable  "picture  over  lapping"  error  problems. 
Increasing   a  and/or  using  thresholds  would  result  in  a  better  approx- 
imation. 
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section  5*3    A  Laterally  Inhibiting  Avalanche 

Although  the  results  with  -a  simple  avalanche  were  not  encour- 
aging, equations  5.1  were  modified  to  produce  a  laterally  inhibiting 
avalanche  for  comparison.  To  convert  a  simple  avalanche  to  a  laterally- 
inhibiting  one,  inhibiting  directed  edges  between  the  grid  nodes  must 
be  added  and  equation  5.1  changed  to: 

5.3.1  x  (t)  =  -ax  (t)  +  P  (t) 

c        c      c      H 

5.3.2  ^(t)  -  -ax^t)  +  PA(t)  +  /3S  Zji(t)xe(t  -  jr  )  - 

5.3.3  z..(t)  =  -U7,,(t)  +  v[x.(t)x  (t  -  jT)] 
where : 

+  fy  if  y>  0 
[0  if  y  f  0 

Figure  5*3.1  shows  the  results  of  performing  the  experiment  of 
section  5*2  on  a  laterally  inhibiting  avalanche.  The  parameters  used 
in  this  experiment  were  the  same  as  those  in  section  5*2  except  that 
v  =  2,4  as  in  the  study  of  the  laterally  inhibiting  outstar,  T   and 
/3   are  the  same  as  in  that  study: 

T"  =  0.1  sec. 

/T  ■  2.38 

The  prediction  response  of  the  grid  nodes  following  t  =  8,8 
seconds  in  figure  5.3.1  shows  that  the  pattern  learned  by  the  avalanche 
is  definitely  not  the  pattern  taught  to  it.  Briefly  analyzing  the 
reasons  for  this  failure,  we  can  see  that  the  deleterious  effects  of 
lateral  inhibition  all  acted  in  concert.  Firstly,  the  fact  that 
lateral  inhibition  diminishes  the  amplitude  of  a  node's  response 
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Figure  5.3.1.   Results  of  an  experiment  with  a  laterally  inhibiting 
out star  avalanche. 
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when  more  than  one  node  is  excited  at  the  same  time  resulted  in 

responses  to  presentation  of  both  of  the  events  at  the  same  time  at 

the  beginning  of  the  pattern  being  diminished.  This  resulted  in  a 

smaller  correlation  amplitude  for  z.,.j(t)  and  z  (t)  when  compared  to 

11  12 

z?j(t)  which  was  the  result  of  the  uninhibited  response  to  the  pre- 
sentation of  event  1  alone  as  the  second  event  of  the  patterns 
The  first  two  responses  of  the  prediction  response  of  x.  (t)  show 
this  effect. 

Secondly,  the  lengthening  of  the  negative  amplitude  inhibitions 
responses  due  to  transmittal  through  several  nodes  resulted  in  a  large 
inhibitory  response  in  x  (t)  when  event  2  was  presented  alone  as  the 
third  event  of  the  pattern.  This  resulted  in  a  small  correlation 

amplitude  for  z~0(t)  which  was  insufficient  to  drive  x  (t)  positive 
y.  2 

in  the  prediction  at  the  appropriate  time. 

Additionally,  the  errors  associated  with  "picture  over  lapping" 
combined  with  the  above  resulted  in  x.(t)  responding  to  a  third  event 
that  was  not  in  the  pattern. 

If  an  attempt  were  made  to  improve  the  laterally  inhibiting 
avalanche's  performance,  ff  should  be  reduced.  It  is  noted  that  if  the 
pattern  had  been  composed  on  the  average  of  a  large  number  of  events 
at  each  sampling  with  only  a  few  events  changing  between  samples,  the 
amplitude  diminishing  effect  would  not  have  been  as  serious.  Due 
to  the  large  number  of  nodes  in  such  a  pattern,  the  resistance  to 
random  mistakes  composed  of  a  small  number  of  events  would  not  be 
compromised  with  a  smaller  /3"  • 

Both  to  avoid  the  inhibitory  response  lengthening  and  "picture 

over  lapping"  errors,  the  interval  between  samples,  X    ,  should  be 
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increased.  Of  course  this  last  suggestion  seriously  compromises  the 
ability  of  a  laterally  inhibiting  out star  to  accurately  approximate 
a  rapidly  varying  pattern.  Thus  solution  of  the  response 
lengthening  problem  of  a  signal  that  must  be  transmitted  through 
several  nodes  is  important.  A  solution  will  be  proposed  in  a  later 
chapter. 

The  avalanches  presented  in  this  chapter  were  for  illustrative 
purposes  to  show  some  of  the  problems  encountered  when  outstars  are 
combined  into  an  avalanche.  Rather  than  dwelling  upon  the  design 
improvements  which  could  be  made  to  the  avalanches,  we  will  go  on 
to  consider  other  formulations  of  outstars  which  aro  the  basic  com- 
ponents of  an  avalanche. 
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CHAPTER  6     THE  VIRTUAL  LATERALLY  INHIBITING  OUTSTAR 
section  6el     Other  Out  stars  VJhich  Control  the  Maximum  Amplitudes 
of  Grid  Node  Responses 

Lateral  inhibition  was  added  to  the  simple  outstar  as  a  means 
of  using  past  experience  to  suppress  random  mistakes  in  a  pattern. 
Its  addition  was  necessitated  by  the  rapid  forgetting  rate  required 
to  control  the  amplitudes  of  prediction  responses.  There  are  methods 
by  which  the  amplitudes  of  prediction  responses  can  be  controled 
other  than  by  alloxrijig  a  fast  forgetting  rate.  We  will  review  a  few 
of  them  as  illustrations  of  different  formulations  of  tho  equations 
for  an  outstar  and  then  investigate  one  of  them. 

One  method  of  controling  the  amplitudes  of  prediction  responses 
would  be  to  place  an  upper  bound  on  the  z  processes: 

6.1.1  x  (t)  =  -ax  (t)  +  P  (t) 

C  C         X  

6.1.2  x^t)  =  -axJit)   +  P±(t)  +  /3  2ci(t)xc(t  -  T  ) 

6.1.3  zci(t)  =  -uzci(t)  +  [MB.-  zci(t)]+vxi(t)xc(t  -r  ) 
where:     +  fy  if  y  >  0 

[y]  H 

IP  if  y  £  0 

Equation  6.1,3  limits  z  .(t)  to  values  between  0  and  M  •  M  is 

ci  z    z 

specified  such  that  p>  Mzxc(t  -  X   )  produces  the  maximum  grid  response 
amplitude  we  are  willjng  to  tolerate.  This  method  has  limited  random 
mistake  resistance,  However,  if  we  specify  v  such  that  it  requires 
several  presentations  of  a  pattern  to  drive  a  z  process  to  M  .  then 
the  occurance  of  one  random  mistake  will  result  in  a  relatively  small 
z  amplitude.  If  u  is  specif ied  to  result  in  a  memory  decay  time  l/u 
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approximately  equal  to  the  average  time  interval  between  consecutive 

occurances  of  the  same  random  mistake,  then  equations  6,1,1  through 

6,1,3  describe  an  out star  which  has  a  relatively  slow  forgetting  rate 

and  amplitude  control  of  the  grid  nodo  responses.  However,  if  an 

outstar  governed  by  this  set  of  equations  is  confronted  with  a  random 

mistake  and  is  then  asked  to  predict  the  pattern  rapidly  for  a 

prolonged  period,  we  can  expect  the  pumping  up  process  to  saturate  all 

the  z  process  at  value  M  ,  including  the  z  process  associated  with 

z 

the  mistake.  Thus  upper  bounding  the  z  processes  to  insure  that 

the  amplitudes  of  prediction  responses  remain  tolerable  is  not  very 

useful  for  an  outstar  functioning  in  a  noisy  environment.  Additionally, 

we  could  expect  that  use  of  a  small  u  would  result  in  poor  corr-ecta- 

bility  as  in  the  simple  outstar, 

A  more  direct  method  of  controling  the  amplitudes  of  predictions 

responses  would  be  to  upper  bound  the  grid  x  processes: 

6AA        x  (t)  =  -otx  (t)  +  P  (t) 
c         c      c 

6.1.5  x^t)   =  -ocx^t)  +  \_MX  -  x^(t)]+(Pi(t)  +  /3zci(t)xc(t  -T  )) 

6.1.6  z  .(t)  =  -uz  .(t)  +  vx  (t  -T  )x  (t) 

ci       ci       c       i 

By  specifying  u  in  equation  6,1,6  to  be  small,  the  outstar  gov- 
erned by  equations  6,1,^  through  6.1,6  would  be  able  to  absorb  random 
mistakes  in  its  experience  as  did  the  simple  outstar  with  a  slow 
forgetting  rate  in  chapter  three.  The  bound  on  the  grid  node's  x 
processes  in  equation  6,1,5  insures  that  this  outstar  will  not  have 
the  uncontroled  growth  of  prediction  responses  that  the  slowly 
forgetting  simple  -outstar  did.  However,  in  this  outstar,  a  large 
z  ,(t)  would' result  in  a  maximum  prediction  input  signal  to  a  grid 

nodo  of  magnitude  M  for  as  long  as  z  .  (t)x  (t  -  T  )  —  M  ,  Because 
°       z  ci         c  z 

116 


the  prediction  signals  have  exponentially  decaying  tails,  this  would 
result  in  the  effective  duration  of  the  maximum  prediction  signal 
input  getting  longer  as  the  z  ..(t)  process  got  longer.  Thus  while 
being  able  to  control  the  amplitude  of  grid  node  prediction  responses, 
we  would  not  be  able  to  control  the  duration  of  the  responses.  In  an 
out star,  we  have  absolute  control  over  the  shape  and  amplitude  of  the 
prediction  signal  x  (t  -  T  )  by  control  of  the  input  pulse  to  the 
command  node.  Thus  by  specifying  the  input  pulses  we  can  analytically 
compute  what  the  prediction  signal  looks  like.  With  this  knowledge, 
a  threshold  Vc   could  be  placed  on  the  command  node  to  guarantee  that 
the  prediction  signal  [x(t-T)-77J   is  non  zero  only  over  a 
specified  interval  of  time.  By  so  restricting  the  duration  of  the 
prediction  signal  we  could  also  limit  the  duration  of  the  grid  node's 
prediction  responses.  Again  the  small  u  resulting  in  good  random 
mistake  resistance  could  be  expected,  to  result  in  poor  correetc.bility. 
The  properties  of  such  an  out  star  would  be  interesting  to  investigate 
but  time  did  not  allow  an  investigation  in  this  study. 

Another  method  of  controlling  the  grid  node  prediction  response 
amplitude  which  we  will  study  would  be  to  make  the  prediction  input 
signal  to  the  grid  nodes  linearly  proportional  to  the  probabilities 
y.(t)  which  define  the  outstar's  memory  of  a  pattern.  By  the  outstar 
theorem,  the  y.(t)  converge  to  the  pattern  probabilities  0 .  which 
are  constant.  Thus  when  the  y^(t)  have  converged  sufficiently  close 
to  the  0 .  we  could  expect  the  prediction  signal  inputs  to  the  grid 
nodes  Ay. (t)x  (t  -  X   )  to  be  the  same  independent  of  the  amplitudes 
of  the  %   .(t)  processes.  As  y.(t)  _  1,  specifying  a  would  determine 
the  maximum  possible  prediction  amplitude  of.  the  grid  node's  responses, 
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Additionally  specifying  the  u  of  the  z  processes  to  be  small  would 
allow  absorption  of  random  raiatakes  in  accumulated  past  experience. 
The  equations  for  such  an  out star  are: 

6.1.7  x  (t)  =  -Ctx  (t)  +  P  (t) 

c         c      c 

6.1.8  i  (t)  =  -aXi(t)  +  p^t)  +  py±(t)x  (t  -r  ) 

6.1.9  b  (t)  =  -uzcl(t)  +  vx  (t  -  X   )x  (t) 

6.1.10  y.(t)  =  z  At)  /    (  £  z  .(t)) 

1         CI    f       j=l   CJ 

Another  attractive  property  of  an  outstar  governed  by  these  equations 
is  that  equal  prediction  signals  will  result  in  equal  grid  node  re- 
sponses independent  of  the  anplitudes  of  the  z  processes.  Thus  we 
could  say  that  the  memory  of  a  pattern  is  always  fresh  in  such  an 
outstar' s  memory  and  pumping  up  is  not  required. 

A  close  examination  of  equations  6,1,7  through  6,1,10  shows 
that  an  outstar  governed  by  these  equations  is  a  laterally  inhibiting 
outstar.  By  lateral  inhibition  we  mean  the  ability  of  a  grid  node 
responding  with  large  amplitude  to  diminish  the  amplitude  of  grid 
nodes  responding  with  lesser  amplitudes.  From  equation  6,l,9i  a  grid 
node  responding  with  a  large  amplitude  will  result  in  a  large  correl- 
ating amplitude  for  tho  associated  z  (t)  process.  This  wall  result 

ci 

in  a  large  probability  y.(t)  from  equation  6,1,10  which  in  turn  will 

allow  a  larger  prediction  signal  input  in  equation  6,1,8,  At  the 

same  time  a  large  z  .(t)  will  result  in  a  smaller  y.(t)  for  nodes 

not  responding  with  large  amplitudes  by  the  inclusion  of  z  (t)  in 

ci 

the  denominator  of  equation  6,1,10  for  y.(t).  This  in  turn  will  result 

in  a  smaller  input  prediction  signal  in  equation  6.1,8  for  x  (t). 

3 
As  can  be  seen  from  equation  6,1,10,  the  accumulated  past  experience 

of  the  outstar  in  the  z  .(t)  processes  plays. a  mojor  part  in  this 
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lateral  inhibition  and  thus  the  past  experience  can  be  counted  upon 
to  inhibit  the  effects  of  a  random  mistake.  An  out  star-  governed  by 
these  equations  combines  absorption  of  random  mistakes  and  active  in- 
hibition of  them. 

The  major  drawback  of  such  an  out star  is  that  it  is  not  consistent 
with  the  elements  of  embedding  field  theory  presented  in  chapter  one. 
Their  neat  geometric  elements  performing  one  function  each  were  pre- 
sented. Because  the  y.(t)'s  perform  the  prediction  signal  amplification 
function  for  this  out star,  they  should  be  located  in  the  arrowheads 
of  the  directed  edges  with  the  z  processes.  This  raises  the  problem 

of  how  the  z   .(t)'s  from  each  of  the  arrowheads  of  directed  ed^es 
ci  ° 

from  the  command  node  are  made  simultaneously  available  at  all  the 
arrowheads  to  form  the  y  (t),s.  We  have  constrained  all  other 
information  transmissions  in  the  outstar  to  finite  velocities  along 
directed  edges.  Because  the  z   .(t)'s  are  instantaneously  available- 

\J-  J. 

at"  all  the  arrowheads  without  any  apparent  means  of  traveling  between 
the  arrowheads,  the  y.(t)  is  a  virtual  process.  The  outstar  described 
by  equations  6.1,7  through  6,1,10  is  there  fore  called  a  virtual  lat- 
erally inhibiting  outstar. 

Although  the  virtual  y  (t)  process  is  not  consistent  with  the 

i 

elements  of  embedding  field  networks  presented  in  chapter  one,  we 

will  study  the  performance  of  a  virtual  laterally  inhibiting  outstar. 

Grossberg  has  done  considerable  theoretical  work  with  it.  (Ref.  7) 

In  the  realm  of  theory,  there  is  no  reason  why  a  virtual  process  should 

be  excluded  from  consideration,  A  virtual  process  does  not  present 

any  difficulties  to  a  digital  simulation  either.  Moreover,  if  we 

were  to  build  electrical  devices  to  make  an  outstar  with,  we  would 
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have  more  trouble  engineering  the  transmission  delays  for  prediction 

signals  than  engineering  the  virtual  y  (t)  processes.  The  only 

i 

place  where  the  virtual  processes  are  clearly  inapplicable  is  in  the 
nervous  system  of  living  organisms  where  all  information  transmissions 
from  one  point  in  the  system  to  another  are  at  a  finite  velocity. 
Whereas  a  virtual  laterally  inhibiting  outstar  is  not  useful  as  a 
model  for  nervous  systems,  it  is  a  legitimate  device  for  study. 
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section  6.2     Specifying  the  Parameters  in  a  Virtual  Laterally- 
Inhibiting  Oat  star 

We  -will  perform  the  same  experiment  on  a  virtual  laterally  inhib- 
iting out  star  as  has  already  been  performed  on  the  simple  and  laterally 
inhibiting  outstars.  Therefore  the  parameters  of  the  virtual  laterally 
inhibiting  outstar  are  specified  to  be  the  same  as  in  the  other  outstars 
except  where  there  are  special  considerations  to  be  made: 
Input  parameters: 

A  =  10 

8  =  1/k    =  0.3  sec. 
Network  parameters! 

«  -  3.3333  sec.'1 

X  -  0.3  sec. 

N  =  3 

Initial  conditions  on  x  and  all  x.  are  zero. 

c        x 

Selection  of  fit   u,  v,  and  the  initial  conditions  on  the  z  processes 

will  require  some  discussion. 

As  the  y.(t)  are  ratios  of  z  (t)  to  the  sum  of  all  z     (t),  we 
i  ci  ci 

want  at  least  one  of  the  z       to  have  a  non  zero  initial  condition  to 

ci 

avoid  the  problem  of  dividing  by  zero.  The  initial  value  should  not 
be  too  large  to  avoid  biasing  the  network  at  the  beginning  of  the 
experiment.  Therefore  at  least  one  z  .  will  be  specified  to  have  an 
initial  condition  of  0,1,  Again,  to  prevent  biasing  of  the  network 
in  favor  of  predicting  any  one  grid  event,  all  the  y.(t)  should  be 
approximately  equal.  This  accomplished  if  the  initial  conditions  on 


all  the  z   are  equal, 
ci 
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Therefore : 

z     (0)  =  O.i       for  i  =  1.  2.  3 

ci 

Notice  that  this  means  that  there  is  a  non  zero  initial  condition  on  the 

y.M'si 

y.(0)  =  O.3333   for  i  =  i.  2.  3 
This  means  that  the  precidtion  signal  at  the  beginning  of  the  experiment 
is  split  up  evenly  between  all  the  nodes  in  the  grid.  A  prediction 
made  in  the  initial  state  of  the  experiment  will  result  in  all  grid 
nodes  responding  equally.  We  must  accordingly  modify  our  interpretation 
of  what  grid  node  responses  mean.  Heretofore  we  have  considered  the 
outstar  to  be  in  a  state  of  complete  ignorance  at  the  beginning  of 
an  experiment.  In  the  simple  and  laterally  inhibiting  outst&rs  this 
state  of  initial  ignorance  was  specified  by  making  the  initial  con- 
ditions on  the  z  processes  zero,  A  prediction  by  one  of  those  out- 
stars while  it  was  in  its  initial  state  resulted  in  no  response  of  the 
grid  nodes.  Thus  we  were  able  to  re-enforce  our  interpretation  of 
initial  ignorance  by  saying  that  there  was  nothing  in  the  outstar' s 
memory  and  the  outstar  couJLd  predict  nothing.  The  virtual  laterally 
inhibiting  outstar  does  not  have  this  nicety. 

We  will  interpret  the  prediction  responses  of  a  laterally 

inhibiting  outstar  to  indicate  total  ignorance  if  all  grid  nodes  respond 

with  the  same  amplitude,  Equivalently,  total  ignorance  is  the  state 

in  which  all  y.(t)  are  equal.  Mote  that  this  interpretation  means 

that  the  pattern  composed  of  all  the  events  represented  by  nodes  in 

the  grid  is  not  perceivable  by  the  outstar.  Excitation  of  all  grid 

nodes  will  result  in  the  same  values  of  the  y  (t)  as  they  have  initially. 

i 

This  is  equivalent  to  saying  that  white  light  is  the  same  as  complete 
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darkness  in  this  outstar.  Thus  an  intelligible  pattern  must  be  composed 
of  fewer  than  N  events.  In  our  experiment  the  pattern  is  composed 
of  one  event  out  of  three  and  thus  is  intelligible* 

In  previous  outstars,  v  has  been  selected  on  a  so  many  presenta- 
tions mean  well  learning  criteria.  This  was  due  to  the  fact  that  the 
prediction  signal  amplification  process,  the  z  .(t),  had  to  grow  to 
a  certain  amplitude  before  a  prediction  would  drive  the  grid  nodes  to 
the  same  amplitudes  as  presentation  of  the  pattern  externally  would 
drive  them.  In  the  virtual  laterally  inhibiting  out star,  this  criteria 
for  v  is  meaningless.  The  prediction  signal  amplification  processes 
are  the  y.(t)  which  by  the  outstar  theorem  are  always  less  than  or 
equal  to  unity  no  matter  what  the  amplitudes  of  the  z   processes  are. 
Thus  small  amplitude  z   processes  will  result  in  the  same  amplitude  grid 

node  responses  as  large  amplitude  z   processes  as  long  as  the  ratios 
N        -,-1 

z   .(t)  \  X  z    ,(t)\  remain  the  same.  Thus  specification  of  v  has 
ci     u  y.\     cj   J 

nothing  to  do  with  the  amplitude  of  grid  node  responses. 

A  ,  on  the  other  hand,  has  a  great  deal  to  do  with  the  amplitude 
of  the  grid  node  responses 0  In  previous  outstars  we  have  tried  to 
control  the  grid  node  responses  so  that  their  amplitudes  during  a  pre- 
diction were  approximately  equivalent  to  those  attained  by  excitement 
by  an  event.  As  v  can  not  be  used  for  that  purpose  in  this  outstar, 
we  will  use  /3  .  With  this  intention,  we  run  into  the  usual  problem 
with  our  outstar  possessing  some  form  of  lateral  inhibition.  That  is, 
we  would  like  to  know  how  many  events  on  the  average  compose  a  pattern. 
In  a  laterally  inhibiting  outstar  we  saw  that  a  (i  "   selected  for  an 
average  of  a  small  number  of  events  in  a  pattern  resulted  in  inefficient 
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learning  of  a  pattern  composed  of  many  more  events.  Nevertheless, 
with  sufficient  instruction  and/or  predictions  ,  the  laterally- 
inhibiting  outstar  is  able  to  "woll  learn"  a  pattern  more  complicated 
than  it  was  designed  to  learn. 

In  the  virtual  laterally  inhibiting  outstar,  we  do  not  have  this 
possibility  for  well  learning  a  pattern  more  complicated  than  ones 
the  network  is  designed  to  learn.  If  we  have  M  <  N  events  on  the 
average  in  a  pattern,  then  the  expected  value  for  the  y.(t)  correspond- 
ing to  events  in  a  pattern  is  y  (t)  =  i/M  after  learning  has  occured. 

i 

The  y.(t)  for  events  not  in  the  learned  pattern  are  small.  Now,  we 
can  specify  /3  such  that: 

/3=  bM 
where  b  is  a  constant  necessary  to  result  in  a  well  learned  grid 
prediction  response  for  a  pattern  composed  of  one  event,  VJith  this 

ft    ,  the  input  prediction  signal  to  a  node  representing  an  event  in  ~ 
the  learned  pattern  is: 

yi(t)pxc(t  -r)  =  (i/M)b!Ixc(t  -?)  =bxc(t  -r) 

and  thus  we  get  well  learned  responses. 

However,  if  there  are  fewer  than  M  events  in  the  pattern  learned, 
the  prediction  responses  will  be  larger.  If  there  are  more  than  M  events 
in  the  pattern  learned,  the  prediction  responses  will  be  smaller. 
Because  the  y.(t)  do  not  change  once  the  pattern  is  learned,  there  is 
no  possibility  of  changing  this  situation. 

Thus  the  well  learning  criteria  is  an  unrealistic  requirement 

for  a  virtual  lateraly  inhibiting  outstar  that  is  confronted  with  the 

possibility  of  learning  a  wide  variety  of  patterns.  The  well  learning 

criteria  was  originally  introduced  because  we  adopted  the  convention 
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of  readirig  the  amplitudes  of  the  x  processes  at  the  nodes  as  the 
response  of  a  node.  As  the  measurement  of  very  small  or  very  large 
amplitudes  was  'Jmproctical ,  the  well  learning  criteria  was  adopted  as 
a  measurement  standard.  For  the  virtual  laterally  inhibiting  out  star 
we  could  devise  another  virtual  process  to  interpret  grid  node  responses. 
For  instance,  the  probabilities: 

x±(t)  =xi(t)[£  x  (t)]~ 

would  be  suitable.  However,  as  the  pattern  we  will  teach  the  out  star 
in  this  experiment  is  simple  and  we  know  that  it  will  be  composed 
of  at  most  one  event,  we  can  retain  the  well  learning  criteria  for 
interpretation.  In  a  more  general  situation  the  above  discussion 
must  be  considered. 

Since  we  are  going  to  teach  the  outstar  a  pattern  composed  of 
at  most  one  event,  and  we  are  going  to  specify  fi   according  to  the  well 
learning  criteria,  we  can  make  a  quick  estimation  of  what  /3  should 
be: 

The  input  prediction  signal  to  the  grid  node  corresponding  to  the 
event  in  the  pattern  should  have  a  maximum  amplitude  equivalent  to  the 
maximum  amplitude  of  an  input  pulse; 

Ay. (t) (max  x  (t  -r))  =  A 

I   3.  C 

For  one  event  in  the  pattern,  y.(t)  =1,0  after  learning.  There- 
fore we  want: 

fjGaax  xc(t  -*))  =  (A/«  )(1   -  e~*S)  =  (A/a  )(1  -  e"1)  =  O.63U/CX  ) 
or: 

A=   07,063  =  5.28 

Experimentally  the  appropriate  value  of  A  was  found  to  be: 

/3=  4.77 

IZS 


The  ii/S  error  is  due  to  both  the  naivete  of  the  estimation  and  the  error 
inherent  in  the  digital  simulation. 

Having  specified  /3  ,  we  will  specify  v  to  be  equal  to  n   arbitrar- 
ily: 

v  =  /3  =  b.77 

Only  u  remains  to  be  specified.  Since  it  is  claimed  that  a  virtual 
laterally  inhibiting  outstar  can  use  the  large  z's  resulting  from  a 
small  u  to  absorb  random  mistakes,  we  will  specify  u  to  be  small. 

u  =  0.01  sec." 

Note  again  that  a  small  u  means  that  the  decay  tir/ie  of  the  z  process 
1/u  is  large  compared  to  the  presentation  and/or  prediction  interval 
to  be  used  in  the  experiment. 


II  G 


section  6.3     Results  of  tho  Experiments  with  a  Virtual  Laterally 
Inhibiting  Out  star 


L6 


Figure  6.3.1  shows  the  results  of  presenting  the  pattern  V  — *-V 

to  the  virtual  laterally  inhibiting  outstar  twice  and  then  asking  for 

a  prediction  of  the  pattern.  As  can  be  seen  from  the  x  (t)  trace, 

V— ■"  V  was  well  learned,  V— *"V\  was  learned  slightly  due  to  the 
c    2  c    3 

prediction  signal's  "tail".   (Event  3  ™as  presented  with  presentation 

phase  <p  ~  +2S  with  respect  to  the  prediction  signal.)  Also  note 

that  x  (t)  responds  to  prediction  slightly  although  event  1  has  not 

been  presented  to  the  outstar. 

Looking  at  the  y.(t)  traces  in  figure  6.3.1  we  can  see  why.  All 

three  y  (t)  started  with  the  same  initial  values  y.(t)  =  0.3333  for 

i  'x 

i  =  1,  2,  3»  The  first  presentation  of  the  pattern  resulted  in  VgCt) 

rising  to  a  maximum  value  of  nearly  0,8  while  y.  (t  )  and  y~(t)  decreased 

to  about  0.1  each,  V/hen  event  3  was  presented  2S  after  event! ,  the 

y.(t)  changed  slightly  due  to  correlation  between  the  prediction  signal's 

tail  and  x  (t).  Note  that  on  the  second  presentation  of  the  pattern, 

y  (t)  decreased  again  and  y  (t)  increased.  According  to  the  outstar 
1  2 

theorem,  more  presentations  of  the  pattern  because  of  correlation  between 

the  tail  of  the  prediction  signal  and  x  (t).  However,  in. the  two 

presentations  in  figure  6,3.1  y  (t)  is  still  large  enough  to  allow  some 

i 

prediction  signal  through  to  excite  x,(t). 

If  we  remember  that  it  was  agreed  to  interpret  an  equal  response 

from  each  of  the  grid  nodes  as  no  response,  then  we  can  place  imaginary 

thresholds,  "f1  »  on  the  x  (t)  traces,   TV  shown  in  way  of  the  third 

response  on  the  x.(t)  trace  -was  chosen  such  that  if  y.(t)  -  0,3333  for 
a  a 
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i  =  1,  2,  3  in  the  out star,  all  grid  node  prediction  responses  would  be 

subthreshold.  Thus,  by  interpreting  a  node  as  not  responding  until 

it  is  suprathreshold ,  we  can  interpiet  the  results  in  figure  6,3.1 

as  saying  that  only  V  — e-  V  war3  learned  by  the  outstar.  The  results 

c    2 

of  performing  an  experiment  on  a  virtual  laterally  inhibiting  outstar 

with  real,  versus  imaginary,  thresholds  will  be  reported  later  in  this 

chapter. 

Of  interest  is  the  fact  that  the  y  (t)  did  not  change  during  the 

i 

prediction.  This  was  an  outstar  theorem  guarantee  which  is  now 

experimentally  verified. 

Figure  6.3.2  shows  the  results  of  continuing  the  experiment.  A 

simulated  random  mistake  was  presented  with  the  pattern  by  presenting 

event  1  at  the  same  time  as  event  2  was  presented.  Note  that  on 

subsequent  predictions,  x  (t)  remained  subthreshold.  It  can  be  concluded 

1 

that  this  virtual  laterally  inhibiting  outstar  is  resistant  to  random" 

mistakes.  However,  looking  at  the  y.(t)  traces,  it  can  be  seen  that  the 

i 

random  mistake  did  reduce  y_(t)  and  this  effect  persisted  through 

subsequent  predictions.  Thus,  even  though  the  prediction  responses 

of  x  (t)  are  subthreshold,  the  y  (t)  remember  the  mistake.  It  will  take 

several  presentations  of  the  correct  pattern  to  undo  the  effect  of  the 

random  mistake.  In  the  discussion  of  using  large  amplitude  z  processes 

to  absorb  mistakes  in  section  3,b   it  was  shown  that  the  z   processes 

would  reflect  the  conditional  probabilities  PR  .  •  Up  to  the  end  of 

i/c 

the  experiment  in  figure  6»3»2,  the  c  event  has  been  presented  6  times. 

Event  2  has  been  presented  3  times  and  event  1  has  been  presented  1  time. 

Using  the  past  history  of  the  oceuranee  of  the  events  in  the  environment 

to  estimate  the  conditional  probabilities  pR  .  and  pit  .  we  gets 
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pR   .    =1/6  =  0.1666 
1/c 

pR2/c  -  3/6  =  0.5 


The  ratio  pR.  ,      /   p^/     =  0.3333t     At  the  end  of  the  experiment  in 

figure  6.3.2,  y^t)  =  0.15  and  y£(t)  =  0.6666.     The  ratio  y  (t)/y  (t) 

is: 

y  (t)/  y2(t)    =  (0.15)/(0.6666)  =  0.225 

As  the  y  (t)  are  directly  proportional  to  the  z  .  (t),  the  above 
i  ci 

calculations  show  that  the  virtual  laterally  inhibiting  out star  is 

more  resistant  to  random  mistakes  than  would  be  expected  if  it  were  just 

using  large  amplitude  z  processes  to  absorb  mistakes.  On  the  other  hand, 

we  can  show  expect  the  large  z's  to  reflect  the  statistics  of  the 

environment  some  what  and  the  inhibitory  mechanism  of  the  out star  is 

not  sufficient  to  completely  overcome  this.  Thus  some  effect  on  the 

y  (t)'s  must  be  expected  from  the  statistics  of  the  environment, 
i 

Figure  6.3*3  shows  the  results  of  continuing  the  experiment  and 

trying  to  correct  the  learned  pattern  V— **-V  with  the  pattern  V — c-  V. 

c2  c    i 

by  presenting  event  1  three  times  in  a  row.  As  can  be  seen,  the 

correction  attempt  was  not  successful..  Looking  at  the  z  (t)  traces, 

ci 

it  can  be  seen  that  the  past  accumulated  experience  of  V  -*-  V4  in  the 

c    l 

large  z  At)   is  so  great  that  although  the  accumulated  experience  of 

V — «*V  in  z  (t)  is  increasing,  it  will  require  many  more  presentations 
c    1     cl 

of  V  — *-  V  to  say  that  the  outstar  has  corrected  the  mistake.  This  was 
c    1 

a  phenomena  noticed  in  the  slowly  forgetting  simple  outstar  also. 
Even  though  this  outstar  does  laterally  inhibit,  it  is  not  surprising 
that  a  large  amount  of  experience  with  a  pattern  will  make  it  difficult 
to  convince  the  outstar  that  the  pattern  is  a  mistake.  In  order  to 
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improve  the  virtual  laterally  inhibiting  out star's  correctability, 
the  forgetting  rate  u  will  have  to  be  decreased e 


/33 


section  6.4     A  Virtual  Laterally  Inhibiting  Oat star  with  Thresholds 
and  an  Intermediate  Forgetting  Rate  Designed  to  Learn 
Patterns  of  More  than  One  Event 

In  the  previous  section,  it  was  concluded  that  the  addition  of 
thresholds  to  a  virtual  laterally  inhibiting  outstar  would  be  an  aide 
to  the  interpretation  of  responses.  It  was  also  concluded  that  a  faster 
forgetting  rate  would  increase  correctability.  In  this  section  ,we  will 
test  these  conclusions.  Additionally,  it  would  be  instructive  to  see 
what  happens  when  the  pattern  being  taught  to  the  outstar  is  composed 
of  more  than  one  event. 

In  order  to  have  sufficient  possibilities  available  to  stiidy 
teaching  an  outstar  a  pattern  composed  of  more  than  one  event,  the  number 
of  grid  nodes,  N,  will  be  increased  to  N  =  5»  We  will  specify  {3  to 
result  in  a  well  learned  response  for  patterns  composed  of  an  average 
M  =  2.5  events.  The  input  pulse  parameters;  the  x  process  rise  rate,  «•  , 
and  the  transmission  delay,  x   ,  will  be  kept  the  same  as  in  section 
6.2,  The  following  parameters  are  therefore  specified: 
Rectangularly  shaped  input  pulses: 

A  =  10 

&  =  0.3  sec, 

C*-=  3.3333  sec.   =  1/S 

T=  0,3  sec. 

Since  thresholds  are  to  be  added  to  the  outstar,  the  equations 
governing  its  performance  will  have  to  be  changed: 


/3^ 


6.4.1        x  (t)  =  -ax  (t)  +  P  (t) 
c  c  c 


6,4.2         x.(t)  =  -ax  (t)  +  P.(t)  +    y.(t)/3fx  (t  -f  )   -  V~] 

6.^.3      zc.(t)  =  -uzc.(t)  +  v  [xc(t  -r)  -   rc]+  [x.(t)  -  vxyh 

6.4.4    y.(t)  -  zci(t)[   S^(t)]"1 

J 

Now  we  are   faced  with  the  problem  of  assigning  values  to  the 


thresholds    V  and   "P  .  In  section  3»5  it  was  concluded  that  putting 
c       x 

thresholds  on  the  grid  node  x  processes  of  a  simple  outstar  was  in- 
advisable because  this  would  result  in  eventual  extinction  of  all  memory. 
This  was  due  to  the  fact  that  the  z  processes  decayed  exponentially 
at  the  rate  u.  It  was  quite  possible  for  the  z's  to  decay  until  the 

predictions  input  signal  az  ,(t)  [x  (t  -  )  -  V  ]    to  the  grid 

•  ex      c         c 

nodes  is  unable  to  drive  the  grid  node  x  process  suprathreshold.  In 

this  situation  the  outstar  could  no  longer  "pump  up"  the  z   process 

because  the  correlating  signal  v[x  (t  -T)-  T  ~\     [x  (t)  -  V 

•  c  cJ    i       x 

would  be  zero.  However,  in  the  virtual  laterally  inhibiting  outstar, 

vie  do  not  have  this  problem.  The  prediction  signal  amplification  processes 

are  the  y.(t)  which  do  not  decay.  Thus  we  may  specify  a  non  zero  P 

in  equation  6.4.3. 

In  fact,  use  of  a  grid  node  threshold  is  advantageous  in  a  virtual 
laterally  inhibiting  outstar.  Beside  the  interpretive  advantage  dis- 
cussed in  section  6,3,  there  is  a  real  improvement  of  performance. 
Since  the  convention  for  interpreting  the  responses  of  a  virtual  laterally 
inhibiting  outstar  says  that  equal  responses  by  all  the  nodes  in  the 
grid  is  a  state  of  total  ignorance,  we  have  specified  equal  initial 

conditions  on  the  y  (t)'s.  That  is,  y  (t)  =  (l/N)  for  all  i.  Now 

i  i 

suppose  that  we  have  have  a  virtual  3.aterally  inhibiting  outstar  in 

a  state  of  total  ignorance.  This  means  that  we  have  not  presented 

135 


an  intelligible  pattern  of  grid  events  with  the  command  event.  However 

it  does  not  mean  that  the  command  event  alone  has  not  been  presented 

to  the  outstar.  In  fact,  until  we  decide  to  teach  the  outstar  that  the 

command  event  is  associated  with  an  intelligible  pattern,  we  may  excite 

the  command  node  as  many  times  as  we  like.  Because  the  prediction 

signal  so  generated  is  being  split  up  evenly  between  the  grid  nodes,  the 

y,(t)  will  not  deviate  from  a  state  indicative  of  total  ignorance. 

However,  the  correlating  signal  vx  (t  -"? )x.(t)  will  become  positive 

c       i 

on  each  such  ignorant  prediction  and  the  z   .(t)   will  grow.  We  had 

great  difficulty  correcting  a  learned  mistake  in  section  6,3  because 

the  esperience  with  the  erroneous  pattern  was  great.  If  the  outstar 

is  allowed  to  accumulate  experience  with  the  ignorant  pattern  by  spurious 

excitements  of  the  command  node,  then  it  will  be  equally  difficult 

to  correct  the  ignorant  pattern  irith  an  intelligible  one. 

Of  course,  increasing  the  forgetting  rate  should  partially 

alleviate  this  problem.  However,  it  would  be  better  to  prevent  the 

outstar  from  accumulating  experience  with  the  ignorant  pattern  altogether. 

A  properly  selected  grid  node  threshold  Y    would  achieve  this  result. 

In  the  state  of  initial  ignorance,  the  amplitude  of  prediction  signal 

inputs  to  the  grid  nodes  is: 

(  P/N)x  (t  -T) 
c 

as  y  (t)  =  l/N  for  all  i.  Suppose  A  has  been  specified  to  result  in 
i  ' 

a  well  learned  response  for  an  average  of  M  <  N  events  to  a  pattern. 

Then  the  ignorant  state  input  prediction  signal  is: 

(bM)/(N)  x  (t  -r) 
c 

where  b  is  a  constant  which  results  in  a  well  learned  response  from  a 

grid  node  when  bx  (t  -  ?)  is  the  prediction  input  signal.  Now  a  well 

/36 


learned  prediction  response  is  one  in  which  the  maximum  amplitude 

of  the  response  is  equal  to  the  maximum  amplitude  of  a  response  elicited 

by  an  event  input  pulse  alone.  Knowing  the  shape,  amplitude,  and 

duration  of  the  input  pulses,  the  maximum  amplitude  of  a  well  learned 

response  can  be  analytically  calculatede  For  the  input  pulses  of  this 

experiment,  it  is: 

x  =  max  amplitude  of  well  learned  response  =  (A/oi  )(j.  -e"  )  = 

0.63(A/a  ) 

Thus  the  proper  V    to  prevent  accumulation  of  experience  with  the 

ignorant  pattern  may  be  analytically  specified  by: 

V  =  max  amplitude  of  prediction  of  the  ignorant  pattern  response  = 

(M/NX0.63  A/ a. ) 

Knowing  that  M  =  2.5,  N  =  5,  A  =  10,  »  =  3.333: 

V   =  0.9^5 
x 

Note  that  this  V    will  work  only  for  the  input  pulses  specified. 

Out stars  are  capable  of  learning  patterns  independent  of  the  vigor 

which  with  they  are  presented.  They  are  also  capable  of  learning  patterns 

composed  of  events  presented  at  different  strengths.  Of  course,  in  a 

threshold  out  star,  there  is  a  minimum  pulse  amplitude  A  which  will 

result  in  superthreshold  responses  and  thus  learning.  In  this  study 

it  was  decided  to  maintain  the  specifications  on  the  input  pulses 

constant  because  a  large  number  of  outstars  are  being  studied,  A 

detailed  study  of  varying  the  input  pulse  specifications  in  each  outstar 

requires  a  prohibitive  amount  of  time.  In  an  outstar  functioning 

in  an  environment . in  which  events  occur  with  varied  amplitudes ,  a 

statistically  average  well  learned  response  could  be  used  to  specify 

a  V    sufficient  to  prevent  accumulation  of  experience  with  the 
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ignorant  pattern  on  the  average.  However,  this  is  not  a  study  that  will 

be  undertaken  in  this  paper.  In  this  study  we.  are  able  to  completely 

know  ahead  of  time  the  exact  specifications  of  our  input  pulses  and  are 

consequently  able  to  specify  the  parameters  of  the  out stars  to  result 

in  the  preformance  we  want. 

Unfortunately,  the  above  analytic  method  was  not  completely 

understood  at  the  time  the  experiment  being  reported  was  performed. 

T\  =  0,^5  was  used  and  consequently  the  out  star  was  able  to  accumulate 

experience  with  the  ignorant  pattern.  Rather  than  re-perform  the 

experiment  with  the  "correct"   V  »  it  was  decided  to  present  the  data 

x 

collected  with  the  "wrong"  "P  .  It  illustrates  the  problem  of  accum- 
ulating  experience  with  the  ignorant  pattern.  Additionally,  examination 
of  the  data  will  reveal  that  there  are  other  properties  associated  with 
any  non  zero  V    which  are  of  more  consequence  than  the  property  of 
preventing  accumulation  of  experience  with  the  ignorant  pattern. 

It  was  decided  to  specify  the  command  node  threshold, V     ,  such 
that  there  would  be  no  correlation  with  events  presented  with  presenta- 
tion phase  <p  greater  than  <£>  =  S  =  0,3  seconds.  From  previous  experi- 
mental data,  V     -  1.0  will  satisfy  this  criteria. 
c 

Addition  of  a  non  zero  T  made  the  analytical  specification  of 

(I    too  difficult.  Thus  a  A  resulting  in  a  well  learned  response 

for  a  pattern  composed  of  M  =  2,5  events  was  experimentally  determined, 

The  value  so  deterziined  was: 

(3  =  2?.9 

u  was  increased  to  test  the  conclusion  that  a  faster  forgetting 

rate  would  result  in  improved  correctabjlity.  The  interval  between 

presentations  and/or  predictions  is  1.8  seconds  which  is  the  same 
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as  in  previous  experiments.  Part  of  the  reason  for  introducing  the 
virtual  lateraHy  inhibiting  out  star  was  to  use  the  accumulation  of 
experience  with  a  snail  u  to  aide  in  resisting  random  mistakes  by 
absorption.  Therefore  we  will  not  make  u  so  small  as  to  completely 
destroy  this  effect.  A  decay  time  of  twice  the  interval  between 
successive  predictions  and/or  presentations  was  selected: 

u  =  0.278  sec."  =  1/(2x1.8  sec.) 

v  was  arbitrarily  specified  to  be  v  =  10. 

Since  a  pattern  composed  of  M  =  2,5  events  is  impossible,  it  was 
decided  to  teach  the  otitstar  a  pattern  composed  of  3  events  and  then 
test  its  random  mistake  resistance.  An  additional  event  presented  with 
presentation  phase  <p  =  +2 S  =0.6  seconds  was  included  with  this  pattern 
to  illustrate  the  effect  of  the  command  node  threshold.  After  this  part 
of  the  experiment  it  was  decided  to  attempt  correction  of  the  pattern 
with  a  pattern  composed  of  M  =  2  events.  It  was  decided  to  make  the 
correcting  pattern  to  consist  of  an  event  not  included  in  the  original 
pattern  and  an  event  that  was  included  in  the  original  pattern.  The 
reason  for  this  selection  of  correcting  events  was  to  see  if  there  is 
any  difficulty  in  learning  that  only  part  of  a  previously  learned  pattern 
is  in  error. 

Before  beginning  to  teach  the  out star  an  intelligible  pattern, 

a  prediction  of  the  ignorant  pattern  was  gotten  by  excitement  of  the 

command  node  alone.  This  was  initially  done  to  demonstrate  that  a 

properly  selected  T  would  prevent  accumulation  of  experience  with  the 

ignorant  pattern.  Because  of  the  error  in  specifying  T  ,  it  serves 

as  a  demonstration  that  accumulation  of  experience  with  the  ignorant 

pattern  is  a  factor  to  bo  considered, 

/3? 


The  foregoing  discussion  is  summarized  in  the  box  below! 
Equations  governing  performance  of  the  outstar: 

£  (t)  -  -<xx  (t)  +  Pe(t) 

£(t)  =  -  ax'(t)  +  py±(t)  [xo(t  - x )  - V  J   +  P.(t) 

£    (t)  =  -uZ    (t)  +  v[xc(t  -r  )  -  Tc]   [^(t)  -    Vx  ] 
ci  C1         n  k-i-l 


Cl  —     N 

y.(t)  =  z  .(t)l   S  ■  , 

•'i  ci      j=i  CJ 

where : 


Input  parameters: 

pulse  shape  is  rectangular 

k  -  10 

g  -  0.3  seconds 

Network  parameters: 
*-  3.3333  sec.   "  =  ll8 
|5=  27.9 
f  =  0.3  seconds 

Vx=  0.^5 

u  =  0.278  sec."1  =  i/(  2  x  1.8  sec.) 

V  a  10 

Initial  conditions: 

x  (0)  =  0 
c 

x.(0)  =  0  for  all  i 


z 


Cl 


.(0)  =  0.1  for  all  i 


and:    y.(b)  =  0.2  for  all  i 
.a 


mo 


section  6,5  An  Experiment  with  a  Virtual  Laterally  Inhibiting 

Outstar  with  Thresholds  and  an  Intermediate 
Forgetting  Rate  Designed  to  Learn  Patterns  of  More 
than  One  Event 

Figure  6,5,1   shows  the  first  phase  of  the  experiment  described 
in  the  previous  section.  The  first  response  on  the  five  grid  node  x 
process  traces  is  a  prediction  of  the  ignorant  pattern  elicited  by- 
excitement  of  the  command  node  alone.  The  z   trace  for  all  five  z  (t) 

ci 

shows  the  experience  accumulated  by  this  prediction.  Although  increase 

in  amplitude  of  the  z   processes  due  to  this  single  prediction  is  small, 

many  such  predictions  would  result  in  an  accumulation.  Even  this  small 

accumulation  of  experience  with  the  ignorant  pattern  affects  the 

performance  of  the  outstar  when  the  pattern  V-*-  (V.  ,  V  ,  V  )  is  pre- 

c    12   3 

sented  to  the  outstar  as  is  shown  by  the  y  (t)  traces.  One  presentation 

i 

of  the  pattern  is  insufficient  to  result  in  convergence  of  the  y.(t) 

i 

to  values  describing  the  pattern  and  a  second  presentation  is  required. 

Even  though  the  grid  node  threshold  V  is  too  small  to  prevent 

accumulation  of  experience  with  the  ignorant  pattern  it  does  improve 

the  learning  performance  of  the  outstar.  Looking  at  the  x^Ct)  trace 

it  can  be  seen  that  the  first  presentation  of  the  pattern  resulted  in 

a  redistribution  of  the  values  for  the  y  (t).  This  redistribution 

i 

was  sufficient  to  prevent  x^(t)  from  going  suprathreshold  long  enough 
to  add  any  appreciable  amplitude  to  z   -(t)  on  the  second  presentation 
of  the  pattern.  Due  to  the  reasonably  rapid  forgetting  rate  u,  z     (t) 
continued  its  decay  during  the  second  presentation.  With  y^t)  so 
small  that  xAt)   can  not  be  driven  suprathreshold,  future  presentations 
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and/or  predictions  will  result  in  no  further  increases  in  the  amplitude 

of  z  (t).  This  would  be  of  particular  importance  if  in  the  first 
c5 

tvro  presentations  of  the  pattern  y  (t),  y  (t),  and  y  (t)  had  not 

12         3 

converged  so  closely  to  the  final  values  describing  the  pattern  of 
y.(t)  =  0.3333  for  i  =  1,  2,  3.  For,  if  the  y.(t)  were  not  so  close 
to  their  final  values,  then  the  prediction  of  the  learned  pattern  would 
have  resulted  in  furthur  convergence  of  the  y.(t)*s  to  this  final 
value.  The  prediction  of  the  learned  pattern  shox-m  in  the  fourth 
response  of  the  grid  node  x  processes  shows  why.  The  prediction 
response  for  the  nodes  V..  ,  V  ,  and  V~  included  in  the  pattern  are  all 
suprathreshold  and  result  in  an  increase  in  amplitude  for  the  corres- 
ponding z  #(t)'s.  The  prediction  response  for  the  nodes  V^  and  V- 
not  included  in  the  pattern  are  subthreshold  and  therefore  do  not 

result  in  increases  in  the  amplitudes  of  z  ,  (t)  and  z  (t).  Thus 

c4        c5 

the  y  (t)  continue  to  converge  during  predictions ,  However,  the  y  (t) 
converged  so  close  to  their  final  values  in  the  two  presentations  of 
the  pattern  shown,  that  this  effect  can  not  be  seen  in  figure  6.5.1. 
A  higher  resolution  look  at  the  y.(t)  showed  that  y.(t),  y_(t),  and 
y  (t)  increased  from  O.3096  to  0,3225  on  this  prediction.  This 
phenomena  is  not  in  contradiction  to  the  outstar  theorem  which  guaran- 
tees only  that  the  y  (t)  will  not  diverge  during  a  prediction,  Con- 

i 

vergence  is  therefore  theoretically  permissible  and  grid  node  thresholds 
result  in  convergence  during  predictions. 

In  figure  6.5.1»  event  h  was  presented  2$    =  0,6  seconds  after 
events  1,  2,  and  3  An  the  pattern.  The  command  node  threshold  V 
was  chosen  to  prevent  any  correlation  with  events  presented  more  than 
"~   0,3  seconds  after  arrival  of  the  prediction  signal  at  the  arrowheads. 
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The  fact  that  y^,(t)  and  z  ..(t)  are  identical  to  yv(t)  and  z   At)   shows 

that  the  command  node  threshold  was  successful.  Presentation  of 

event  k   resulted  in  a  correlation  equivalent  to  no  presentation  at 

all. 

As  can  be  seen,  the  |3  selected  resulted  in  learned  prediction 

responses  for  the  three  events  in  the  pattern  of  approximately  the 

same  amplitudes  as  the  response  elicited  by  an  input  pulse  alone. 

(Compare  the  maximum  amplitudes  of  the  prediction  responses  of  x^(t)t 

x0(t),  and  x„(t)  with  the  maximum  amplitude  of  x  (t).) 
2         j  c 

Figure  6,5»2  shows  the  continuation  of  the  experiment ,  The  pattern 

V  — ■-  (V.  ,  V  ,  V,.)  is  presented  with  a  simulated  random  mistake.  Event 
c     ■«■   2   JJ 

5  is  this  mistake.  As  can  be  seen  the  presentation  of  the  random  mistake 

resulted  in  a  healthy  increase  in  z     (t).  However,  this  was  insuffic- 

ient  to  drive  y^Ct)  large  enough  to  result  in  a  suprathreshold  x-(t) 

on  prediction.  Therefore  z     (t)  continues  to  decay  on  subsequent 

predictions  and  is  bound  for  extinction.  A  slight  decrease  in  y^(t) 

can  be  seen  during  the  prediction  response  in  figure  6»5,2*     This  is 

due  to  the  prediction  convergence  phenomena  described  above.  Thus  we 

can  conclude  that  more  predictions  will  result  in  the  y.(t)  converging 

back  to  the  values  they  had  before  the  occurance  .of  the  random  mistake. 

Figure  6.5.3  shows  the  results  of  continuing  the  experiment.  The 

previously  learned  pattern  V  -*-  (V,  ,  V  ,  V  ).is  corrected  by  the  pattern 

c     123 

V  -*»  (V.  ,  vV),  The  difficulty  with  this  correcting  pattern  3  s  that 
c     1    H" 

event  1  is  included  in  both  the  original  pattern  and  the  correcting 
pattern.  As  can  be  seen,  it  only  required  four  presentations  of  the 
collecting  pattern  to  result  in  subthreshold  x  (t)  and  Xo(t)  responses. 

V  and  V  represent  the  events  2  and  3  which  were  part  of  the  old 
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pattern,  but  are  not  included  in  the  new  pattern.  Additionally, 
y  (t)  and  y~(t)  have  decreased  in  these  four  presentations  to  the  point 
where  it  can  be  safely  concluded  that  the  dominant  pattern  is  V-»*(Vj.  Vju). 
This  situation  should  be  compared  to  the  unsuccessful  attempt  to  correct 
a  pattern  by  three  presentations  of  the  correcting  pattern  in  the 
virtual  laterally  inhibiting  outstar  with  a  slow  forgetting  rate  shown 
in  figure  6.3.3c  It  can  be  concluded  that  increasing  the  forgetting 
rate  does  improve  the  correctability  of  a  virtual  laterally  inhibiting 
outstar. 

The  final  values  for  the  y.(t)'s  to  describe  the  correcting  pattern 
are: 

y  (t)  =  y3(t)  =  y5(t)  -  0 

As  can  be  seen,  y,(t)  has  slightly  overshot  its  final  value  and 
yj,(t)  has  only  reached  a  value  of  y^t)  =  O.38.  However  y  (t)  and  y.  (t) 
are  converging  toward  each  other.  We  may  conclude  that  the  previously 
accumulated  experience  with  event  1,  which  is  common  to  both  patterns, 
is  great  enough  to  make  convergence  to  the  new  pattern  difficult. 

It  should  be  noticed  that  the  prediction  responses  of  x. (t)  and 
x  (t)  at  the  end  of  the  experiment  are  both  of  greater  amplitude  than 
a  response  to  an  input  pulse  alone.  This  is  an  effect  of  latera.1  in- 
hibit ion.  In  the  old  pattern  of  M  =  3  events,  the  prediction  response 
amplitudes  of  grid  nodes  associated  with  the  pattern  was  slight3.y  less  than 
the  amplitude  of  a  response  to  an  input  pulse  alone.  (3  had  been  speci- 
fied to  result  in  a  well  learned  response  for  a  pattern  consisting  on 
the  average  of  M  =  2.5  events „  Thus  the  3  event  pattern  results  in  smaller 
than  well  learned  grid  node  responses  and  the  2  event  pattern  results 


an 


larger  than  well  learned  grid  node  responses. 
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CHAPTER  7     OTHER  FORMULATIONS  FOR  THE  z  PROCESS 
section  7 A  Introduction 

In  the  discussion  of  the  laterally  inhibiting  outstar  it  was 

memtioned  that  the  outstar  was  excitory  biased.  The  equation  for  the 

z  processes  in  the  laterally  inhibiting  outstar  was: 

6.1.1  z  (t)  =  -uz  .(t)  +  vfx  (t  -r)x.(t)]  + 
ci        ci       L  c       i 

where : 


+ 
Ly3  - 


y  if  y  >  0 

Oif  y£  0 

By  excitory  biasing,  it  was  meant  that  the  learning  z  processes  could  only 

assume  non  negative  values.  Thus  the  input  prediction  signal  to  a 

grid  node,  (3  z  .(t)x  (t  -Y  )  is  always  non  negative  and  can  not  drive 
'  ci    c 

the  grid  node's  x  process  to  negative  amplitudes.  In  this  way,  the  z 
processes  are  biased  against  learning  to  inhibit  grid  nodes  and  are 
biased  in  favor  of  learning  to  excite  them. 

In  this  chapter  we  shall  drop  the  excitory  biasing  restriction  and 
conduct  an  investigation  to  see  if  there  is  any  value  in  out stars 
which  can  learn  to  inhibit  grid  nodes  as  well  as  excite  them  by  pre- 
diction signals  from  the  command  node.  One  reason  for  conducting  this 
study  is  that  in  the  laterally  inhibiting  outstar  we  had  to  introduce 
a  new  element  in  the  embedding  field  network  elements.  The  inhibitory 
directed  edges'  arrowheads  contained  z  processes  which  were  assigned  the 
permanent  value  of  -1,  These  z  processes  did  not  learn  their  vibaes  as  do 
the  z  processes  in  the  other  arrowheads  in  the  network  and  we  must 
consider  a  non  learning  z  process  to  be  a  new  feature.  In  the  avalanche 
using  a  long  axon  and  collaterals  wo  avoided  the  use   of  z  processes  with 
permanent  values  of  +1.   If  we  solve  the  pulse  lengthening  problems  of 


the  out star  avalanche,  then  we  will  have  to  use  another  new  element. 
Development  of  a  general  formulation  for  z  processes  to  cover  all  z 
processes  would  eliminate  the  need  for  making  exceptions  for  special 
design  feature  in  a  network.  We  will  attempt  to  formulate  more  general 
z  processes  in  this  chapter.  Throughout,  we  shall  be  speaking  of 
embedding  field  networks  which  do  not  have  any  virtual  processes 
associated  with  them.  The  networks  we  shall  discuss  conform  to  the 
embedding  field  elements  of  chapter  one.  ' 


ISO 


section  7*2     A  Description  of  the  States  of  the  Processes  in  an 
Out star 

A  z  process  at  an  arrowhead  correlates  the  prediction  signals 
arriving  at  the  arrowhead  and  the  x  process  at  the  node  upon  which  the 
arrowhead  impinges;  and  it  remembers  what  the  correlations  in  the  past 
have  been.  The  z  process  can  therefore  be  considered  to  be  a  function 
of  the  past  and  current  states  of  the  adjacent  node  and  the  prediction 
signals.  The  z  process  itself  can  be  thought  of  as  being  in  various 
states.  For  instance,  we  can  think  of  a  large  amplitude  z   process 
as  being  in  an  excitory  state  as  it  allows  largw  prediction  signals 
through  to  excite  the  adjacent  node.  Small  amplitude  z   processes  could 
be  thought  of  as  being  in  an  unlearned  or  ignorant  state. 

In  this  chapter  we  shall  use  this  idea  that  z   processes  are  in 

states  which  may  be  completely  determined  by  the  past  history  of  the 

states  of  the  prediction  signal  and  the  grid  node  x  processes.  VJe  shall 

develop  a  state  function  (k  (x  ,  x.)  which  maps  the  states  of  the 

c   l 

prediction  signal  x  and  the  grid  node  x  process  x,  into  a  z   process 

c  1 

state  z  . : 
ci 

<*~  (x  ,  x. )  =  z 
c   a    ci 

It  will  be  found  that  this  function  ^  is  a  handy  way  to  describe  the 
logic  behind  the  learning  process  in  an  outstar  and  for  this  reason 
we  shall  call  the  state  function  <L    a  "logic".  However,  before  the 
usefulness  of  such  a  "logic"  can  be  demonstrated,  we  must  build  up 
a  description  of  the  states  of  the  various  processes  in  an  outstar. 

In  outstars  without  virtual  processes,  we  are  concerned  with  four 
processes: 

151 


1.  Inputs,  P  (t)  and  P.(t) 

c        a. 

2.  Node  x  processes,  x  (t)  and  x  (t) 

c        i 

3.  The  prediction  signal  from  the  command  node,  [x  (t  -T  )  -  T  1 

c         c 

where  T_  may  be  zero 
c 

h.     The  z  processes,  ^c.(t) 

Input  pulses,  P  (t)  and  P  (t)  have  been  used  to  indicate  the  occurance 
c        i 

of  events  in  the  environment.  There  are  two  possible  states  for  an 
event.  Either  it  is  occuring,  or  it  is  not.  We  have  transmitted 
information  about  whether  an  event  is  occuring  or  not  to  the  outstar 
by  the  input  pulse.  A  positive  amplitude  has  been  used  to  signify 
that  an  event  is  occuring.  A  zero  amplitude  has  been  used  to  signify 
that  an  event  is  not  occuring.  The  following  code  can  therefore 
describe  the  state  of  inputs  and  the  state  of  the  events  they  describe: 

(a)  P  =  +i  indicates  that  an  event  is  occuring  and  that  the  assoc- 
iated input  has  a  positive  amplitude. 

(b)  P  =  0  indicates  that  an  event  is  not  occuring  and  that  the 
associated  input  has  a  zero  amplitude. 

Node  x  processes  have  been  used  to  signify  the  recent  presentation 
of  an  event  and/or  a  recent  prediction  of  an  event.  A  large  positive 
amplitude  has  been  interpreted  as  indicating  that  the  outstar  "thinks" 
that  the  event  represented  by  the  node  in  question  has  occured  recently 
or  at  least,  should  have  occured  recently.  Small  positive  amplitudes, 
or  zero  amplitudes  have  been  interpreted  as  indicating  that  the  outstar 
is  not  "thinking"  anything  about  the  event  represented  by  a  node. 
Negative  amplitudes  have  been  interpreted  as  indicating  the  same 
state  as  small  or  zero  amplitudes. 


ISl 


By  placing  thresholds  on  the  nodes,  we  were  able  to  precisely 
determine  when  an  x  process  was  of  large  enough  positive  amplitude  to 
indicate  that  the  out star  is  "thinking"  an  event.  With  thresholds 
we  may  replace  the  word  "large"  in  the  preceding  paragraph  with  the 
word  "supra threshold".  In  the  same  manner  "small",  "zero",  and 
"negativo"  may  be  replaced  with  "subthreshold", 

Thus  we  have  two  states  for  a  node  x  process: 

(i)  x.  =  1  indicates  a  state  where  the  x  process  at  a  node  is  of 
sufficiently  large  positive  amplitude,  or  is  suprathreshold8  This  state 
corresponds  to  the  interpretation  that  the  out star  is  "thinking"  about 
the  event  represented  by  the  node, 

(2)  x.  =  0  indicates  a  state  where  the  x  process  at  a  node  is  of 
small  or  zero  positive  amplitude,  or  is  subthreshold.  This  state  corres- 
ponds to  the  interpretation  that  the  outstar  is  not  "thinking"  about 
the  event  represented  by  the  node,  — 

Although  the  notion  "thinking"  about  corresponds  to  the  psychologi- 
cal interpretation  of  x  processes'  amplitudes,  it  is  clumsy.  In  the 

outstar,  the  only  "way  an  x  process  can  get  into  the  state  x  =  1  is 

i 

to  respond  to  an  input.  That  is,  it  must  respond  to  excitement  by  an 

input  pulse  or  an  input  prediction  signal,  or  both.  Thus  we  could 

describe  the  state  x.  =  1  as  "responding"  or  "excited".  To  avoid 

semantic  difficulties,  the  state  x.  -  1  will.be  called  the  "excited" 

state . 

For  semantic  reasons  also,  the  state  x  =  0  will  not  be  called 

i 

"not  thinking"  about.  Although  "not  excited"  would  apply  well  to 

x.  =  0,  it  will  not  be  used  either.   Instead  the  state  x*v=  0  will  be 

called  "ambient".   "Ambient"  is  used  because  it  refers  to  a  state 
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which  is  the  usual  state  of  an  x  process.  The  ambient  state  x.  =  0 

is  also  the  passive  state  to  which  an  x  process  always  returns, 

Furthert  it  is  the  state  of  an  x  process  when  it  is  not  being  actively 

driven  by  signals  from  outside  the  node.  Thus  it  was  felt  that  "ambient" 

accurately  describes  the  state  x  =0. 

i 

In  the  above  listing  of  states  for  x  processes,  an  x  process 

responding  with  a  negative  amplitude  was  not  included.  Although 

we  have  followed  the  convention  of  interpreting  negative  amplitudes  as 

being  the  same  as  ambient  amplitudes,  the  inhibitory  process  that  results 

in  negative  amplitudes  is  not  an  ambient  process,  A  negative  amplitude 

can  be  achieved  only  if  the  x  process  is  being  actively  driven  in  the 

negative  direction  by  signals  from  outside  the  node.  It  is  therefore 

definitely  not  "ambient".  There  is  no  reason  why  our  description  of 

the  states  of  x  processes  should  have  to  conform  with  our  interpretation 

of  what  those  states  mean.  We  will  refer  to  an  x  process  of  negative 

amplitude  as  being  in  the  inhibited  state  and  indicate  this  state  by 

x.  =  -1,  We  will  continue  to  interpret  the  state  x^  ~  -1  as  indicating 
i  x 

the  same  interpretive  state  as  x_.  =  0, 

The  difficulty  with  the  inhibited  state  is  that  it  is  a  subjective 

state  within  the  outstar.  In  the  environment  the  state  of  an  event  can 

be  described  as  actively  occuring  or  passively  not  occuring.  There  is 

no  such  thing  as  an  event  that  actively  does  not  occur.  However,  we 

saw  that  a  practical  simple  outstar  with  only  the  two  x  process  states 

of  being  excited  or  being  ambient  had  very  little  resistance  to  random 

mistakes.  We  added  lateral  inhibition  to  allow  the  outstar  an  active 

\  process  whereby  it  could  subjectively  prevent  events  from  occuring. 

Particularly,  lateral  inhibition  was  added  to  subjectively  prevent 
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random  mistakes  from  occuring  in  a  previously  learned  pattern. 
Suppose  we  had  a  black  box  that  was  claimed  to  be  a  learning  machine. 
The  only  way  we  could  determine  if  it  was  a  learning  machine  is  to 
teach  it  something  and  then  see  if  it  could  reproduce  what  we  taught  it. 
We  would  only  be  able  to  observe  the  events  we  were  teaching  it  and  the 
box's  response.  Now,  the  box's  response  would  be  events  to  us. 
Thus  from  our  point  of  view  the  only  states  the  box  could  communicate 
to  us  would  be  the  state  of  a  response  occuring  or  the  state  of  a 
response  not  occuring.  The  state  of  a  response  somehow  being  able  to 
not  occur  with  greater  vigor  than  simply  not  occuring  is  meaningless. 
Thus,  our  interpretation  of  what  an  outstar  is  doing  is  limited  to  what 
we  could  observe  if  the  outstar  were  a  black  box. 

We  have  used  this  interpretive  convention  and  will  continue  to  do 
so.  However,  an  outstar  is  not  a  black  box  to  us3  We  can  observe  all 
the  processes  occuring  inside  it.  Thus  we  are  confronted  with  the  in- 
hibited x  process  state  which  we  can  observe  inside  the  outstar,  but 
which  is  meaningless  when  observed  outside  the  outstar.  Inside  the 
outstar  the  inhibited  state  is  meaningful  and  definitely  corresponds 
to  something  other  than  ambient.  Thus  we  have  assigned  a  separate 
state  to  describe  the  state  of  an  x  process  which  is  being  actively 
driven  to  negative  amplitudes  by  signals  from  outside  the  node. 

There  is  some  difficulty  in  saying  when  an  x  process  is  in  the 

inhibited  state  in  an  outstar  with  thresholds.  An  x  process  can  be 

actively  driven  subthreshold  by  inhibitory  processes  and  still  have 

a  non  negative  amplitude.  For  simplicity  this  situation  will  be 

y  considered  to  be  ambient.  The  inhibitory  state  is  therefore  only 

the  state  in  which  an  x  process  has  a  negative  amplitude.  In  case  of 
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a  negative  amplitude,  there  is  no  confusion  about  the  x  process  at  a 
node  being  actively  driven  toward  negative  values  by  signals  from 
outside  the  node. 

In  summary,  the  states  of  an  x  process  at  a  node  are: 

(1)  The  excited  state,  "x.  =  +1,  The  amplitude  of  the  x  process  is 
large  or  suprathreshold, 

(2)  The  ambient  state,  x.  =  0.  The  amplitude  of  the  x  process 
is  small,  zero,  or  subthreshold. 

(3)  The  inhibited  state,  x.  =  -1 .  The  amplitude  of  the  x  process 
is  negative. 

A  prediction  signal  at  an  arrowhead  is  the  originating  node's  method 

of  influencing  the  other  nodes  in  the  network.  In  order  to  define  our 

logic  «\  (x  ,  x.),  we  will  have  to  assign  states  to  prediction  signals 
c   1 

at  an  arrowhead.  We  could  assign  the  same  states  to  prediction  signals  as 

1 
we  have  assigned  to  x  processes.  This  would  mean  that  the  prediction 

signal  is  conveying  the  state  of  its  originating  node  to  the  arrowhead. 
However,  prediction  signals  do  more  than  convey  the  state  of  the  origin- 
ating node  to  the  arrowheads,  They  also  influence  the  state  of  the 
x  process  at  the  node  upon  which  the  arrowhead  impinges.  There  is  no 
difficulty  in  allowing  a  prediction  signal  to  have  a  large  or  supra- 
threshold amplitude  and  describing  this  state  as  the  excited  state  with 

state  value  x  =  +1,  However,  the  other  states  we  may  allow  a  prediction 
c 

signal  to  be  in  require  some  discussion. 

First,  consider  the  case  of  a  prediction  signal  coming  from  a 
node  with  a  threshold  on  it.  In  the  past  we  have  used  both  "real" 
thresholds  and  "imaginary"  thresholds.  The  imaginary  thresholds  were 
placed  on  a  node  for  precision  in  interpreting  when  the  node  was 
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responding.  The  'real'  thresholds  were  placed  on  a  node  to  prevent  the 

z  processes  from  learning  spurious  associations  when  the  x  process 

was  of  small  amplitude.  In  the  case  of  the  command  node,  thresholds 

were  used  to  prevent  the  command  prediction  signal  from  causing 

spurious  associations  from  being  learned  when  it  was  of  small  amplitude. 

This  wa.s  accomplished  by  restricting  the  command  prediction  signal 

to  be  zero  until  it  was  suprathreshold,  i.e.  [x  (t  -t  )   -     V   ]  , 

c  c      ■ 

In  this  case,  we  also  prevented  thr  prediction  signal  from  influencing 
the  state  of  the  grid  node  upon  which  it  was  impinging  until.it  xra.s 
suprathreshold.  This  was  accomplished  by  making  the  input  prediction 
signal  to  the  grid  node  to  be  p  .  (t)  £  x  (t  -  t  )  -  V  1  , 

There  is  a  reason  behind  this.  Suppose  we  have  an  out star  grid  which 
is  shared  by  many  command  nodes  representing  separate  and  distinct 
command  events.  In  the  environment,  a  distinct  pattern  of  grid  events 
usually  occurs  with  each  of  the  command  events.  If  the  outstar  is 
to  function  properly,  it  must  be  able  to  learn  that  a  certain  command 
event,  c^  ,  is  associated  only  with  the  pattern,  6  ,  ,  which  occurs  with  . 
it  in  the  environment.  It  must  be  prevented  from  learning  that  the 
patterns  occuring  with  the  other  command  nodes  in  the  environment  are 
associated  with  c.  . 

A  subthreshold  command  node  x  process  only  occurs  when  the  command 
event  has  not  occured  recently  in  the  environment.  Thus  we  can  expect 
that  a  pattern  not  corresponding  to  this  command  event  is  on  the  grid 
when  the  command  node  is  subthreshold.  By  making  the  prediction  signal 
coming  from  a  subthreshold  command  x  process  identically  zero,  we  prevent 
the  outstar  from  building  up  a  wrong  association.  Additionally,  by 
making  the  prediction  signal  identically  zero,  we  prevent  it  from 
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exciting  the  grid  nodes  which  are  included. in  the  pattern  associated 

with  this  particular  command  node.  This  is  important.   Consider  two 

command  nodes  V  .  and  V   which  represent  events  c  and  c  which  occur 
cl      c2  12 

in  the  environment  with  patterns  ©..  and   ©   respectively.  Suppose 

x  1(t)  is  subthreshold  and  x  _(t)  is  suprathreshold.  Then  we  can  expect 

that  the  grid  node  x  processes  indicate  that  the  pattern  9?  is  on  the 

grid.  We  have  already  agreed  to  make  the  prediction  signal  [x.^Ct-T  )  - 

+  -* 

Xs   ~]      identically  zero  to  prevent  V  —*»  Q      from  being  learned.  Suppose 
c  ~J-    c 

however  that  we  allow  the  prediction  input  signal  from  V  .  to  the  grid 
nodes  representing  0,  to  become  excited.  The  pattern  on  the  grid  would 
therefore  be  the  algebraic  sum  0  +   ©  ,  The  prediction  signal 
coming  from  the  suprathreshold  node  V   will  therefore  cause  the 
association  V  ~ — •"  e  +   0  to  be  learned. :  To  prevent  this  possibil- 
ity we  have  made  the  prediction  signal  input  from  a  subthreshold  command 
node  identically  zero. 

Thus  the  prediction  signal  from  a  subthreshold  command  node  is 
identically  zero.  We  may  as  well  drop  the  fiction  of  assuming  that  a 
prediction  signal  was  sent  from  the  command  node  in  the  first  place 
and  say  that  a  prediction  signal  is  sent  out  along  the  directed  edges 
only  if  the  x  process  at  the  originating  node  is  suprathreshold. 

We  also  used  "real"  thresholds  interpret ively.  We  now  have  the 

case  that  a  subthreshold  x  process  at  a  node'  is  interpreted  as  no  response. 

Further,  it  is  unable  to  influence  other  nodes  in  the  network  because 

no  prediction  signal  is  sent  from  this  node.  Thus  a  certain  amount  of 

consistency  is  added  to  our  interpretation  of  the  amplitudes  of  the  x 

processes.  An  x  process  which  indicates  no  response  also  has  no  effect 

on  the  other  nodes  and  processes  in  the  network.   If  we  were  unable 
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to  measure  the  amplitude  of  an  x  process  at  its  node,  we  would  have 

no  way  of  knowing  what  amplitude  it  had  as  long  as  it  was  subthreshold,  ' 

From  the  point  of  view  of  an  external  observer  or  any  of  the  other 

processes  in  the  outstar,  a  subthreshold  x  process  is  indeed  ambient. 

Thus  we  have  an  "ambient"  state  for  prediction  signals  at  an 

arrowhead.  It  is  indicated  by  a  zero  amplitude  and  is  assigned  the 

state  value  xV  =  0.  It  must  be  remembered  that  this  state  arises  from 
c 

an  originating  node  that  was  subthreshold  T  time  units  before. 

In  the  case  of  an  outstar  without  thresholds,  we  lose  the  pre- 
cision in  defining  when  a  prediction  signal  is  ambient.  We  will 
therefore  describe  a  small  amplitude  on  a  prediction  signal  to  be 
ambient.  As  previously,  "small"  will  mean  small  relative  to  the 
maximum  amplitude  of  a  well  learned  response. 

Having  made  the  prediction  signal  coming  from  a  subthreshold 
x  process  identically  zero,  it  would  be  silly  to  allow  prediction  signals 
coming  from  an  inhibited  x  process  to  be  non  zero.   In  this  study  we 
will  not  consider  prediction  signals  of  negative  amplitude.  Part  of 
the  reason  is  that  allowing  an  inhibited  x  process  to  send  out  pre- 
diction signals  would  violate  the  consistency  we  have  just  developed. 
An  x  process  state  which  is  interpreted  as  no  response  should  not  be 
able  to  influence  the  other  processes  and  nodes  in  the  network. 
Another  reason  is  that  prediction  signals  of  negative  amplitude  are 
not  required.  We  have  seen  that  the  negative  amplitude  of  inhibitory 
input  prediction  signals  in  lateral  inhibition  can  be  accounted  for 
by  allowing  z  processes  with  negative  values.  In  fact,  lateral  inhibi- 
tion has  been  the  only  case  in  which  we  have  used  inhibition.  The 

whole  function  of  lateral  inhibition  was  for  an  excited  grid  node  x 

\S1 


process  to  inhibit  the  other  nodes  in  the  grid.  Thus  the  emission 
of  inhibitory  prediction  signals  from  a  node  was  only  -useful  when 
that  node  was  in  the  excited  state. 

In  summary,  the  states  of  a  prediction  signal  at  an  arrowhead 
are: 

(1)  The  excited  state,  "x  =  +1.  The  amplitude  of  the  prediction 
signal  at  the  arrowhead  is  large  and  positive.  This  results  from  a 
large  or  suprathreshold  x  process  at  the  originating  node  X    time  units 
previously. 

(2)  The  ambient  state,  x  =  0.  The  amplitude  of  the  prediction 
signal  at  the  arrowhead  is  small  or  zero.  This  results  from  a  small, 
zero,  subthreshold,  or  negative  x  process  at  the  originating  node  X   time 
units  previously. 

We  will  assign  the  following  states  to  a  z  process  based  upon  its 
amplitude: 

(1)  The  excitory  state,  zc^  =  +1.  A  z  process  is  in  this  state 
when  its  amplitude  is  large  and  positive. 

(2)  The  ambient  state,  z  .  =0,  A  z  process  is  in  this  state  when 
its  amplitude  is  small  or  zero. 

(3)  The  inhibitory  state,  z  .  =  -1.  A  z  process  is  in  this  state 
when  its  amplitude  is  negative. 

The  states  for  z  processes  at  an  arrowhead  were  assigned  according 
to  what  effect  a  prediction  signal  modified  by  the  z  process  would 
have  on  the  node  upon  which  tho  arrowhead  impinged.  Clearly,  a  z 
process  with  a  large  positive  amplitude  would  result  in  prediction 
excitement  of  the  impinged  upon  node.  A  z  process  with  a  negative 
amplitude  would  result  in  prediction  inhibition  of  the  node.  A  z 
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process  with  a  small  or  zero  amplitude  would  result  in  very  little 
disturbance  of  the  impinged  upon  node.  The  ambient  state  for  a  z 
process  is  also  the  passive  state  for  a  z  process.  With  a  non  zero 
forgetting  rate,  it  is  the  state  to  which  a  z  process  passively  returns, 
and  it  is  the  state  which  a  z  process  assumes  when  it  has  not  been 
perturbed  by  signals  from  outside  the  arrowhead. 

Up  to  now,  the  only  way  a  z  process  could  assume  the  inhibitory 
state  z  was  by  permanent  assignment  of  a  negative  value  to  the  z  process. 
In  what  follows,  we  will  consider  new  formulations  for  the  equations 
for  a  z  process  that  will  allow  a  z  process  to  learn  to  assume  the 
inhibitory  state. 
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section  7»3     Logics 

Having  described  the  states  of  the  various  processes  in  an  outstar, 

we  are  now  ready  to  introduce  the  function  c\  (x_,  x. )  =  z  .    cC 

01     ci 

describes  how  the  state  of  a  z   process  at  an  arrowhead  is  determined 
from  the  states  of  the  prediction  signal  at  the  arrowhead  and  x  process 
at  the  adjacent  node.  Throughout  this  discussion  the  state  of  a  pre- 
diction signal  will  be  denoted  by  x  .  The  state  of  the  adjacent  x 
process  will  be  denoted  by  x.,  and  the  state  of  the  z   process  will  be 
denoted  by  z   . .  The  choice  for  the  subscripts  was  motivated  by  the 
geometry  of  an  outstar,  but  the  discussion  is  not  limited  to  outstars. 
It  applies  to  all  networks  which  may  be  built  from  embedding  field 
elements.  Throughout,  the  function  ^  will  be  called  a  "logic".  We 
will  introduce  several  distinct  logics  and  they  will  be  distinguished 
by  subscripts,  i.e.  cL   . . 

-  A  logic  is  a  tabular  function.  That  is,  we  tabulate  all  the 
possible  combinations  of  prediction  signal  states  and  x  processes 
states  and  assign  a  z   process  state  to  this  combination.  For  example, 
the  logic  <knt   for  the  excitory  biased  outstars  we  dealt  with  previously 
is  defined  by: 
Definition  of  the  Excitory  Biased  Logic,  X0 


<K.(x„,  X  )  =  Z. 


xc                  xi           .  (T"c'  "i'   "ci 

0.0  0 

+10  0 

0+1  0 

+1              +1  +1 

0-1  0 

+1                    -1  0 

(The  inhibitory  prediction  signal  state  x  =  -1  has  been  excluded 

from  consideration  for  reasons  of  consistency  as  explained  in  section  7#2) 
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The  reasons  for  calling  this  an  excitory  logic  are  clear.  The 
only  states  allowed  for  the  z  process  are  the  ambient  state  zc^  =  0, 
and  the  excitory  state  "z   =  +1,  .  The  ambient  state  is  passive.  The 
z  process  does  not  actively  learn  to  be  in  the  ambient  state.  Therefore 
the  only  st?.te  which  the  z  process  can  actively  leam  is  the  excitory 
state.  Thus  the  z  process  is  biased  to  learn  only  the  excitory  state. 

The  excitory  biased  logic,  «(q,  is  implemented  in  an  outstar  by 
the  equation: 

7.3.1    zc.(t)  =  -u*ci(t)  +  v  [xc(t  -  r)  -  r  ]  4  [x.(t)  -    rx  ]+ 

where  either  or  both  thresholds  can  be  zero. 

The  driving  functions  in  equation  7.3.1  is: 

v[xc(t  -r)  -  Vclr   [x.(t)  -  rx]  + 
This  function  is  always  non  negative.  It  can  actively  drive  the  z 
process  only  when  the  prediction  signal  and  the  adjacent  x  process  are 
both  in  the  excited  state.  Additionally,  because  the  driving  function 
is  always  non  negative,  it  can  only  drive  the  z  process  in  the  direction 
of  increasing  positive  amplitudes.  Thus  our  tabular  definition  of  cL  0 
conveniently  summarizes  the  effects  of  equation  7.3.1  on  the  outstar. 

Note  that  c(\  only  describes  the  immediate  effect  of  the  states 
of  the  prediction  signal  and  the  adjacent  x  process  on  the  z  process. 
It  does  not  describe  the  current  state  of  a  z  process  based  on  the 
entire  past  history  of  the  prediction  signal. and  x  process  states. 
That  is,  cCq  only  tell  us  in  which  direction  the  z  process  will  be 
driven  by  the  signals  at  a  given  time. 

We  shall  now  consider  other  logics  for  z  processes.  A  general 

approach  would  be  to  consider  all  the  possible  assignments  of  z  , 

states  to  each  of  the  six  distinct  combinations  of  x„  and  x.  states. 

c      1 
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However,  this  results  in  3  logics.  We  will  therefore  have  to  use  some 
judgement  in  selecting  the  logics  to  be  considered. 

A  key  tenet  of  embedding  field  theory  is  that  an  excited  predic- 
tion signal  and  an  excited  x  process  should  result  in  an  excitory  z 
process.  Thus  we  will  only  consider  logics  in  which; 

<L(x  =  +1,  x.  =  +1)  =  +1 

Also,  we  have  always  started  experiments  which  the  z  processes  in 
the  ambient  state.  That  is,  the  initial  conditions  on  the  z  processes 
have  always  been  small  or  zero.  We  have  interpreted  those  initial 
conditions  as  a  state  of  initial  ignorance.  It  would  be  senseless  to 
allow  a  learning  machine  to  develop  from  initial  ignorance  to  learning 
something  by  itself.  For  this  reason  we  will  only  consider  logics 
in  which: 

A(*c  =  °'  \   =  0) 

h 

This  reduces  the  possible  logics  to  3  =  81,  There  are  no  over- 
riding reasons  for  excluding  broad  categories  of  the  remaining  logics. 
However,  81  logics  is  just  too  many  to  consider.  We  will  only  consider 
those  which  show  promise  in  this  study.  These  logics  are  defined  in  the 
table  below: 
Table  7.3.1 


X0 


X, 


JC 


xc 

xi 

z  . 

CI 

z  .  • 

Cl 

zci 

z 

0 

0 

0 

0 

0 

0 

+1 

0 

0 

0 

-1 

-1 

0 

+1 

0 

0 

-1 

0 

+1 

+1 

+1 

+1 

+1 

+1 

0 

-1 

0 

0 

-1 

0 

+1 

-1 

0 

-1 

-1 

-1 

ci 


/6¥ 


<£   is  the  excitor-y  biased  logic  we  have  considered  previously, 

<L  1  is  the  logic  resulting  from  removing  the  non  negative  restriction 

on  the  driving  f -unction  in  equation  7.3.1s 

7.3.2     z  (t)  =  -uz  .(t)  +  vx  (t   -r)x.(t-) 
ci       cl       c       i 

As  the  tabulation  of  <£  -j  shows,  if  x.(t)  is  negative,  z  .(t)  will  learn 

x  3-  Cl 

inhibitions  <£.«  can  be  considered  a  neutrally  biased  logic  because  the 

z  process  is  not  biased  in  favor  of  excitation  or  inhibition, 

oC.  is  interesting,  but  of  dubious  value.  Suppose  that  all  the 

z  processes  in  a  network  are  in  the  ambient  state  at  the  beginning  of 

an  experiment.  That  is,  the  network  is  in  a  state  of  initial  ignorance 

at  the  beginning  of  an  experiment.  Then  a  z  process  in  this  network 

can  not  possibly  assume  the  inhibitory  state.  The  reason  is  that 

the  only  states  for  input  pulses  are  P  =  +1  and  P  "  0,  The  input 

pulses  can  only  drive  x  processes  in  the  network  can  assume  are  x.  =  +1 

and  x.  =  0  due  to  input  pulses.  Therefore  the  prediction  signals  in  the 

net  work  can  only  assume  states  x  =  'ML  and  x  =  0.  The  combination 
J  c  c 

of  states  x  =  +1,  x.  =  -1  can  not  occur.  By  the  tabulation  of  cC-t  » 

c       2.  * 

the  state  z\.  =  -1  can  not  be  attained, 
ci 

Thus  the  logic  <£ *    is  effectively  equal  to  the  logic  <n  q.  If  we 
allowed  the  permanent  assignment  of  negative  values-  to  z  processes  in 
a  network  governed  by  X, ,  then  it  is  possible  for  the  learning  z  processes 
in  the  network  to  learn  inhibition.  However,  this  requires  the  arti- 
ficiality of  a  z  process  with  a  permanently  assigned  value. 

X  defined  in  table  7.3*1  is  particularly  interesting  in  an' 
outstar,  As  can  be  seen  from  the  tabulation,  z  processes  in  a  network 
governed  by  <£«  can  learn  inhibition  from  a  state  of  initial  ignorance 

vrithout  the  use  of  z  processes  with  permanently  assigned  negative 

IBS 


values.  The  two  assignments: 

£  (x  =  +1,  x.  =  0)  =  -1 
3  c       a 

j;3(xe  -+i,  x.  =-D  =  -i 

insure  this.  In  an  outstar,  these  assignments  mean  that  a  command 
node  can  learn  to  inhibit  grid  nodes  which  do  not  correspond  to  events 
in  the  pattern  associated  with  the  command  event.  Consider  a  command 
event  c  which  usually  occurs  with  the  pattern  8  in  the  environment. 
Let  the  grid  events  {  i  I  be  the  events  which  compose  this  pattern. 
Let  the  grid  events  {  j  }  be  the  remaining  events  represented  by- 
grid  nodes.  Then  the  assignment: 
X3(xc  -+1,  3c±  =  +1)  =+1 

means  that  the  z  processes  z  •  (t)  associated  with  the  grid  nodes  in- 

cj-c 

eluded  in  the  pattern  will  learn  excitation.  The  assignment: 

X3(xc  =  +1,  x±  =  0)  =  -1 
means  that  the  z  processes  z  .  (t)  associated  with  the  grid  nodes  not 
in  the  pattern  will  learn  inhibition.  Further,  the  assignment: 

£jx  =  +i,  x.  =  -i)  =  -i 

J>     c       1 

insures  that  once  these  z  processes  have  learned  inhibition,  they  will 

continue  to  do  so.  The  result  is  that  after  having  learned  the  pattern, 
presentation  of  the  command  event  alone  will  result  in  the  grid  nodes 
included  in  the  pattern  being  excited.  The  grid  nodes  not  included 
in  the  pattern  will  be  inhibited.  If  a  random  mistake  occurs  in  the 
pattern,  the  learned  inhibition  will  cause  it  to  be  supressed. 

We  will  consider  an  out  star  governed  by  $.      in  detail  in  the  next 
chapter.  The  rest  of  this  chapter  will  be  devoted  to  an  outstar 
governed  by  £    . 
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section  7»^     Foxwulation  of  the  z  Process  Conforming  to  Logic  &., 

The  logic  JC  9   is  defined  by  the  tabulation: 
Table  7.^.1 

X2(xc,  x.)  =zci 


X 

c 

x 

0 

0 

•U 

0 

0 

+1 

+l 

+1 

0 

-1 

+1 

-1 

X 

c 

Xi 

0 

0 

+1 

0 

+1 

+1 

+1 

-1 

In  this  section  we  shall  develop  a  formulation  for  a  z  process 

that  will  conform  to  this  tabulation.  However,  we  might  inquire 

beforehand  if  this  is  a  worthwhile  endeavor.  The  large  number  of 

inhibiting  assignments  makes  cf  ?  appear  somewhat  useless.  In  the 

discussion  of  JC~  in  the  previous  section  we  saw  that  the  following 

assignments  in  table  7»^»1  are  useful: 

X(xc,  x.)  =  zc. 

0 
-1 
+1 
-1 

We  only  have  to  establish  the  possible  usefulness  of  the  other  two 

assignments: 

7.^.1      «f  0(x  =  0,  x.  =  +1)  =  -1 
2  c      i 

7.^.2     X  o(5c  =  0,  x  =  0)  =  -1 
*  c      i 

Assignment  7»^«1  above  says  that  a  z  process  will  learn  inhibition 
if  a  grid  node  is  excited  and  the  prediction  signal  is  not.  This 
combination  can  only  occur  if  a  pattern  not  corrc spending  to  the  command 
event  is  on  the  grid.  Thus  learning  to  inhibit  this  pattern  by  pre- 
diction when  the  command  event  is  presented  is  useful.  Assignment  7»^t2, 
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however,  can  get  lis  into  trouble.  Suppose  there  are  two  command  nodes 

V   and  V   sharing  the  same  grid.  Let  the  command  events  c  and 

d     C2  * 

c  represented  by  these  nodes  usually  occur  with  the  distinct  patterns 

©  ,   and  ©  p  respectively.  Let  the  event  represented  by  grid  node 

i,  be  an  event  which  is  included  in  pattern  O     but  not  included  in 

pattern  0?,  Then  we  can  expect  that  excitement  of  V   will  result 

in  the  x  processes  of  the  grid  nodes  assuming  the  values  describing 

B    .  Additionally,  because  of  assignment  7»^»if  node  i,  will  be 

inhibited.  Therefore  assignment  7.^.2  will  result  in  zn   .  (t)  learning 

inhibitions.  If  this  is  learned  sufficiently  well,  subsequent  excitement 

of  V  M   will  result  in  grid  node  i,  being  inhibited  even  though  it  is 

part  of  the  pattern  6  associated  with  a.. 

This  vividly  illustrates  some  of  the  problems  we  can  get  into  with 
logic  oT  ,  It  is  not  the  only  one.  If  it  happens  that  the  command 
node  or  the  grid  nodes  in  an  out  star  are  randomly  excited  for  some  time 
then  <C      will  cause  all  the  z  processes  in  the  out star  to  learn  inhibi- 
tion. When  we  get  around  to  teaching  the  out star  the  pattern  associated 
with  the  coimnand  event,  we  will  have  to  overcome  this  initial  inhibitory 
biasing.  In  a  real  environment,  this  will  probably  be  the  case.  Our 
outstar  will  be  "born"  with  all  of  its  z  processes  in  the  ambient  state. 
It  will  then  spend  a  period  of  time  in  the  environment  before  "going  to 
school".  In  this  period  the  random  occurance  of  the  command  event 
and  grid  events  is  highly  unlikely.  Therefore,  when  the  outstar  "goes 
to  school"  all  of  its  z  processes  will  probably  be  inhibitory  biased. 

In  order  to  prevent  this  inhibitory  biasing  from  destroying  the 

outstar' s  ability  to  learn  when  it  goes  to  school,  we  will  limit 

its  effect.  That  is,  we  will  limit  the  maximum  negative  amplitude 
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of  a  z  process  to  a  value  that  will  insure  that  positive  associations 
can  not  be  completely  inhibited.  This  rather  vague  statement  will 
become  clearer  as  we  progress  in  the  study  of  an  outstar  governed 

by  X2. 

A  formulation  for  the  z  processes  in  an  outstar  that  conforms 
to  cf  ?  is: 

7.4.1  z  (t)  =  -tLzci(t)  +  v(a(x2(t  -r)  +  Xi(t))2  -  b(xc(t  -r)  - 

x.(t))2) 
with  b  >  a;  a  >  0 

Expanding  the  right  hand  side  of  equation  7o4,l  we  get: 

7.4.2  z  .(t)  =  -uz  (t)  +  v  (  -(b  -  a)x  2(t  -  r)  -  (b  -  a)x.2(t)  + 

ci        ci       v         c  1 

2(a  +  b)xc(t  -r)  Xi(t)) 
with  b  >  a;  a  >  0 
From  equation  7.4,2  it  can  be  seen  that  this  formulation  conforms  to  X?. 

It  is  interesting  exactly  how  this  formulation  came  about.  In  - 
the  progress  of  the  experimental  study  for  this  thesis  report,  the 
author  began  thinking  of  simulating  an  outstar  on  an  analogue  computer. 
At  that  time  the  idea  of  logics  had  not  been  thought  of.  The  author 
was  interested  only  in  simulating  an  excitory  bissed  outstar  on  an 
analogue  computer.  To  do  this  the  z  process  driving  function : 

vx  (t  -r)x.(t) 
had  to  be  simulated.  The  product  of  two  varying  signals  is  implemented 
on  an  analogue  computer  by  means  of  square  law  devices.  For  example, 
the  product  xy  is  implemented  by  forming  the  sums: 

x  +  y   and  x  -  y. 

These  sums  are  then  scaled  by  constant  factors  a  and  b.  Each  sum  is 

sent  through  a  separate  square  law  device  and  then  the  difference  is 
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formed: 

a(x  +  y)  -  b(x  -  y) 
expanded,  this  is: 

(a  -  b)x2  +  (a  -  b)y2  +  2(a  +  b)xy 
Thus,  by  selecting  the  scaled  factors  a  and  b  such  that: 

a  =  b 
the  result  of  this  process  is: 

2(a  +  b)xy 
Scaling  this  by  l/(2(a  +b))  results  in  the  desired  product. 

It  was  recognized  that  an  outstar  so  simulated  with  a  f  b  would 
have  some  of  the  desirable  properties  of  cC„.  A  digital  simulation 
of  an  outstar  with  the  formulation  7.^.1  for  the  z  processes  was  run. 
The  results  were  confusing  and  in  an  attempt  to  clearly  define  the 
properties  of  this  outstar  the  idea  of  logics  and  processes*  states 
was  conceived.  Having  developed  this  concept,  it  was  realized  that  it 
was  a  handy  description  of  the  possibilities  for  formulating  other  z 
processes.  Additionally,  it  was  a  convenient  method  of  predicting  what 
an  outstar  with  various  z  process  formulations  would  do. 

The  z  process  formulation  given  in  equations  ?.^.l  or  7 ,h,Z   has 
some  interesting  properties  other  than  those  described  by  the  tabulation 
of  cCpt  The  z  process  driving  function  in  equation  7.^.1  is: 

D(t)  =v(a(x  (t  -T)  •:•  x,(t))2  -  b(x  (t  -T)  -  x,(t))2) 

v     C  -L  C  1      ' 

This  is  composed  of  two  competing  processes.  The  process  driving  the 

z  process  in  the  direction  of  an  excited  state  is  a(x  (t  -  r)  +  x.(t))  . 

c         i 


the  z  process  in  the  direction  of  an  inhibited  state. 


Competing  with  it  is  the  process  -b(x  (t  -T)  -  x.(t))  which  drives 


no 


Of  particular  concern  to  us  is  the  point  where  these  competing 

driving  functions  exactly  balance  one  another.  This  point  is  achieved 

when: 

a(x  (t  -T)  +  x.(t))2  =  b(x  (t  -T)  -  x.(t))2 
c         a         c         i 

Let  m  be  the  ratio  of  the  amplitude  of  a  prediction  signal  at  an  arrow- 
head and  the  adjacent  node  x  process,  i.e.: 
x.(t) 


xc(t  -  r) 

then: 

2 


AX.  +  X. 

— -I 

/<xi  "  xi 


1 

or: 

2 


V/A-  1  / 

since  b/a  >  1  >  0: 
/*+  1 


s=  +  "yb/a    which  is  a  real  value « 
/a-1 

UsJng  the  positive  square  root,  we  gets 

Vb/T  +  1 

Using  the  negative  square  root,  we  get  the  inverse: 
-Jb/a   -  1 

Note  that: 


/7/ 


+ 

This  calculation  shows  us  that  there  are  two  ratios,  m    and  M 

/  0     /  0 

where  the  competing  driving  functions  are  balanced.  Note  that  M  ft  = 
i/u  n  which  is  as  it  should  be  from  the  definition  of  u     .  For  a 
ratio  between  the  prediction  signal  and  the  x  process  of  u    where 
falls  in  the  range: 

A" (  /  ?o+ 

the  total  driving  function  D(t)  is  positive.  Thus  the  z  process  is 

being  driven  in  the  excitory  direction. .  Note  that  the  bounds  Ax        and 

+ 

m  n  of  this  region  are  both  positive.  Since  we  do  not  allow  negative 

prediction  signals,  this  means  that  D(t)  is  positive  only  when  both 

x  (t  -  f  )  and  x  (t)  are  positive  in  conformity  with  Jl  „.  Outside 
c  a  2 

the  region  u    <  u       <   u  ,  D(t;  is  negative  and  the  z  process  is 
being  driven  in  the  inhibitory  direction. 

The  ratios  U    and  its  reciprocal  u  are  called  the  cross 

over  ratios  for  obvious  reasons.  By  specifying  a  and  b  to  result  in 
a  particular  cross  over  ratio,  we  can  specify  a  sort  of  "floating" 
threshold  on  the  z  process.  The  thresholds  we  have  considered  previ- 
ously have  all  been  "fixed".  That  is,  the  amplitude  of  the  process 
they  were  thresholding  was  compared  to  their  fixed  value.   If  it  was 
greater  than  this  fixed  value  we  got  a  different  result  than  when  it 
was  less.  The  floating  threshold  in  the  z  process  under  consideration 
is  a  function  of  the  ratio  of  the  amplitudes-  of  the  prediction  signals 
and  the  x  process.  If  this  ratio  falls  in  a  certain  range  we  get  one 
result  and  if  the  ratio  falls  outside  this  range  we  get  another.  The 
range  is  completely  determined  by  the  constants  a  and  b. 

One  further  analytic  property  of  D(t)  is  that  it  is  a  convex 

function  of  the  ratio  K  .   It  therefore  has  a  maximum  with  respect 
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to  m  which  we  computes 

3D(t) 
~   -2  u  (b  -  a)  -  2(a  +  b)  =0 

or  the  maximum  of  D(t)  with  respect  to  u    occurs  at: 

a  +  b 
u  max  - " 
7       a  -  b 

+ 
note  that  u     <  u  max  <  M  q  ■ 

This  says  that  the  maximum  "force"  driving  a  z  process  in  the 
excitory  direction  occurs  when  the  prediction  signal  and  the  x  process 
are  in  the  ratio,  AAmax,  to  one  another.  There  is  no  minimum  to 
D(t).  Thus  the  driving  function  D(t)  seems  to  be  biased  in  favor  of 
driving  the  z  process  in  the  inhibitory  direction.  To  compensate 
for  this  and  to  cover  the  initial  inhibitory  biasing  of  this  z  process, 
we  will  artificially  bound  D(t)  on  the  negative  side.  That  is,  we  will 
use  a  driving  function  D(t)  defined  by: 

D*(t)  =  51(MB  +  zci(t))  D(t) 
where  M  >  0 
and  where: 

f  1  if  y  -±   0 
1      10  if  y  <  0 

By  the  proper  selection  of  M  ,  z  . (t)  will  be  prevented  from  assuming 
large  negative  values  that  would  totally  inhibit  the  learning  of  excitory 
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section  7»5     Specification  of  the  Parameters  in  an  Outs-tar  Conforming 
to  Logic    ck  p 

By  incorporating  the  equation  for  a  z  process  developed  in  the 
previous  section,  we  get  the  equations  governing  an  outstar  conforming 
to  logic  JC  2  s 

7.5.1  x  (t)  =  -ctxjt)  +  P  (t) 

c         c      c 

7.5.2  i±(t)  =  -*x±(t)   +  Pi(t)  +pzci(t)xc(t  -  r) 

7.5.3  zci(t)  =  -uzci(t)  +  v8^(\  +  aci(t))  (  a(xc(t  -r)  +  x^t))2  ■ 

b(xc(t  -*)  -  x^t))2) 

With  this  formulation,  the  z  processes  in  the  arrowheads  of  the  directed 
edges  from  the  command  node  can  learn  inhibition.  If  they  do,  then  the 
excitement  of  the  command  node  trill  result  in  direct  inhibition  of  the 
grid  nodes.  For  this  reason,  the  outstar  governed  by  equations  7.5 
will  be  called  a  directly  inhibiting;  outstar.  We  will  run  the  same 
experiment  that  we  have  used  on  other  outr/tars.  Therefore  the  parameters 
of  the  directly  inhibiting  outstar  are  specif ied  to  be  the  same  as  in 
the  other  outstars  except  where  there  are  special  considerations  to 
be  made: 
Input  parameters: 

The  input  shape  is  rectangular 

A  =  10 

«  =  0.3  seconds 
Network  parameters! 

01  -  3.3333  seconds" 

T  =  0.3  seconds 

17? 


Network  parameters  continued! 

N  =  3 

Initial  condition  on  all  variables  is  zero. 

The  presentation  rate  for  presentations  and/or  predictions  will 
be  1,8  seconds.  11  will  be  specified  such  that  the  decay  time  l/u 
for  the  z  processes  will  be  twice  the  presentation  rate: 

u  =  0.278  sec.  -  l/((2)(1.8  sec.)) 

To  specify  a  and  b  we  must  select  the  cross  over  ratio     M  fi     =  ^/>uo" 


/o 


x  (t  -r) 

c 


is  the  ratio  between  those  functions  at  which  the 


competing  driving  functions  in  the  z  process  balance,,  A  cross  over  ratio 
of  M   =11,5  was  selected  arbitrarily.  Thus: 

b/a  -  ((/*  +  l)/(^  ~  1))  2  =1./^ 
Arbitrarily  1  b  was  selected  to  be  b  =  i. 
Therefore  a  =  O.707. 

With  these  parameters,  v  was  experimentally  determined  on  the  two 
presentations  mean  well  learning  criteria.  The  value  of  v  so  deternnjned 
was: 

v  =  0.25 

M  the  lower  bound  on  negative  excursions  of  the  z  processes 

requires  some  thought.  M  should  be  specified  luch  that  an  amplitude 

of  z  ,(t)  =  -M  will  not  prevent  learning  of  excitory  associations, 
ci       z 

Consider  equation  7»5«2  for  the  x  processes  when  z  .  (t)  =  -M  : 

ci.       z 

x.(t)  =  -ax  (t)  +  P  (t)  -  a  M  x  (t  -* ) 
a         i      i     /   z  c 

If  the  node  V.  is  being  excited  by  an  input  pulse  we  want  the  combination 
of  inputs  to  V., 


n 


r 


P.(t)  -  flHx(t-r) 

1  I        7,    C 

to  be  sufficiently  positive  to  drive  x.  (t)  to  valuos  such  that: 


x  (t  -f )     x  (t  -r) 
x  (t)  > 


1      /o+        "•* 

If  this  condition  is  met,  then  the  driving  function  for  z  ,(t)  will 

be  positive  and  z  .(t)  will  move  away  from  the  value  z  . (t)  =  -M_ 
*■  ci  ci       ^> 

in  the  excitory  direction.  In  such  a  situation,  the  outstar  will  always 

be  able  to  learn  that  the  command  node  is  exc it orally  associated  with 

a  grid  event  by  sufficiently  many  presentations. 

Analytically,  the  maximum  amplitv.de  for  x  (t  -  Y)  is  (A/a  )(l  -  e  ). 

If  we  make  P.(t)  -  A  !•!  (A/oc  )(1  -  e~l),  then  we  could  expect  the  3n- 
i     /   z 

hibitory  input  /9  M  x  (t  -f)  and  the  excitory  input  P.(t)  to  approx- 
imately cancel.  In  this  case,  /3  M  =  /(i  -  e~l)  -  5«28.  Since 
/3  =  1 ,  we  therefore  want  M  <  5^28  at  least.  To  allow  room  for  errors, 

M  =2.10  was  selected. 
z 

To  investigate  the  effect  of  random  occurances  of  the  command 
event  in  inhibitorally  biasing  the  outstar  before  it  "goes  to  school" , 
the  command  node  alone  was  excited  once  before  presentation  of  the  pattern. 


I1G 


section  ?.6     Experiments  with  a  Directly  Inhibiting  Oat  star 

Figure  7.6.1  shows  the  results  of  performing  an  experiment  with 
the  directly  inhibiting  out star  specified  in  the  last  section.  Note 
that  excitement  of  the  command  node  alone  at  the  beginning  of  the 
experiment  results  in  small  negative  amplitudes  for  the  z  processes. 
The  directly  inhibiting  outstar  is  thus  slightly  inhibitor-ally  biased 
before  "going  to  school".   "School"  begins  with  the  second  presentation 
of  the  command  event.  From  the  xAt)   trace  it  can  be  seen  that  the 
pattern  V- — e-V^  was  approximately  well  learned  in  two  presentations. 
Event  1  is  not  presented.  The  z  <(t)  trace  shows  that  the  outstar 
has  learned  to  directly  inhibit  grid  node  V^ , 

Event  2  was  presented  with  <§  =  0  presentation  phase  with  respect 
to  the  arrival  of  the  prediction  signal.  (Presentation  phase  has  been 
explained  in  section  3»5»)  Event  3  was  presented  with  presentation 
phase  <P  =  0.6  seconds  after  event  2,  As  can  be  seen  from  the  x~(t) 
and  z   0(t)  traces,  the  outstar  has  learned  to  inhibit  grid  node  V^, 

The  experiment  was  continued  to  test  the  resistance  of  the  directly 
inhibiting  outstar  to  random  mistakes  in  the  pattern.  Figure  7»6.2 
shows  the  results.  Event  1  is  the  simulated  random  mistake.  As  can 
be  seen  from  the  x,(t)  and  z  ..(t)  traces,  the  direct  inhibition  the 
outstar  learned  before  the  occurance  of  this  mistake  resulted  in  little 
damage  to  the  pattern.   zej("k)  rose  to  a  small  positive  amplitude 
which  is  decaying.  The  prediction  following  occurance  of  the  mistake 
did  not  cause  z  ,.(t)  to  increase.  Thus  we  may  conclude  that  the  outstar 
will  forget  the  mistake  entirely  in  time. 

The  experiment  was  continued  to  test  the  correctability  of  the 

111 


A=IO     n 


HSh- 


Pc(t) 


Kbj       ^ 


xc(t) 


kK^  i  Vh^ 


10  - 


~\T\- 


§ 

•p 


1 
1 

© 

& 

o 
c 

o 

CO 


© 


vO 


o 

•rl 
bO 
O 

H 

>> 
.Q 

*o 

c 

© 
> 
o 

to 


p,  (t)       x,(t) 


1__J I 


10    - 


p„(t) 

2 


i     i     iWv 


X  (t) 
2 


/t^-l 


10    - 


p3(f) 


_J L 


x3(t) 


Hk.  l  i 


zcl(t) 


z  _(t) 

c3 


1 


0 


M. 


ns 


M 

2       3       4       5      6       7     z 
TIME  (sees) 


-181- 

A=IO  n      . 
P  t) 


kh±i Id^j i 


xc(t) 


HtH 


10 


p.(t) 


i  kK_    i 


X.(t) 


10-n 


PJt) 

2       x2(t) 


N-  i/N^.-  i 


V 

CO 
-P 

io 

•H 

e 

E 
O 

-a 
c 

CO 
X> 

t) 
-p 

CO 


h 

4) 

CO 
-P 

en 
•H 

E 

x: 
-p 

10 
•H 


-P 
C 

V 


!7<? 


10 


bC 


C\J 


5 
0 


P_(t) 

X,(t) 


Zc,(t) 


M 


0      12      3       4 
TIME  (sees) 


5    t 


-p 
C 

t 

CO 

c 

P 

«M 
O 


3 
(0 

t 


•H 


1) 
-P 

P 
CO 

p. 
-a 

CD 


CO 

o 

•H 

> 

t) 
(-. 

D, 
V 

-p 


A=IO 


Pc(t) 


iMvi    kkj J 


xc(t) 


10 


P,(t)       X,(t) 


J I I 


io- 


P2(t)   x2(t) 


I  /-K  IXH.    I 


10- n 


P3(t) 


i  m.    i 


X,(t) 


YhL 


0       12       3      4 
c_      TIME  (sec) 

Zc,(t) 


Z.„<t) 


(80 


£ 


to 

2 

O 

•H 

> 

0 

• 

u 

^  ^ 

p. 

> 

4) 

s: 

to 

p 

t=- 

-p 

C 

o 

u 

ft) 

0> 

ll 

-p 

u 

-p 

o 

rt 

o 

P. 

o 

W) 

p 

c 

•H 

tiO 

-P 

c 

O 

•H 

ft) 

-P 

*-. 

P- 

fn 

r: 

O 

e) 

O 

-p 

-P 

ft) 

to 

£ 

P 

Cm 

C 

X 

-P 

-P 

•H 

rH 

■*t 

3 

tn 

C\ 

t> 

> 

t) 

tc 

£ 

t» 

c 

5-. 

• 

D 

<T\ 

-P 

• 

P 

vC 

Ctf 

• 

p. 

o 

TD 

V 

O 

u 

C 

go 

5h 

•H 

<L^ 

fe 

iH 

directly  inhibiting  outstar.  Figure  7.6.3  shows  the  results.  An  attempt 

was  make  to  correct  the  previously  learned  pattern  V — «-V  with  the 

correcting  pattern  V  — *»-  V_  by  presenting  V — »-V  twice.  Figure  7*6.3 

shows  that  the  attempt  was  unsuccessful.  The  first  presentation 

of  V  — »-V~  was  treated  like  a  random  mistake.  The  previously  learned 
c     J 

inhibition  of  V  was  sufficient  to  prevent  zco(t)  from  rising  to  much 
of  a  positive  amplitude.  The  next  presentation  of  V— »-Vo  did  result 
in  a  healthy  increase  in  z  ~(t).  Further  presentations  of  V  — *»  V~ 

Cj>  c     } 

will  result  in  it  being  learned  better.  However,  V  — *-V,?  was  not 

"unlearned"  during  this  time.  Both  excitements  of  the  command  node 

resulted  in  approximately  well  learned  responses  by  x^Ct). 

The  only  method  by  which  this  directly  inhibiting  out star  can 

correct  a  pattern  is  to  forget  the  old  pattern  while  learning  the  new 

pattern.  From  the  z  ~(t)  trace  we  can  see  that  the  presentation  rate 

for  V  — &-V  was  just  right  to  result  in  "pumping  up"  z  At)   such 
c    3  •>  c^ 

that  V  — *»V9  remained  well  learned  during  the  correction  attempt. 

Thus  the  outstar  could  not  forget  V  — *-  Y„  while  learning  V  — *-  V  .  The 

C    ^  c    j 

addition  of  lateral  inhibition  and/or  increasing  the  forgetting  rate 

u  would  probably  increase  the  correctability,  but  these  options  were 

not  investigated. 

In  the  discussion  of  the  logic  <£  ?  in  section  7«^i  it  was  noted 

that  random  excitation  of  the  grid  nodes  without  excitation  of  the 

command  node  might  result  in  inhibition  of  a  learned  pattern.  The 

assignment: 

X  (x  =  0,  x.  =  +1)  =  -1 
2  c      a 

is  the  source  of  this  possible  trouble.  It  was  decided  to  see  if  this 

was  indeed  a  problem.  All  the  grid  nodes  were  excited  twice  without 
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exciting  the  command  node.  The  command  node  was  then  excited  to  see 
what  would  be  predicted  on  the  grid.  Note  that  because  of  the  un- 
successful correction  attempt,  the  outstar  had  learned  V — *-  (V  ,  V  ) 

C      c.         j 

at  the  end  of  figure  7»6»3» 

Figure  7«6.^-  shows  the  result,  z   0(t)  and  z  At)   were  very 
slightly  driven  in  the  direction  of  inhibition  by  the  grid  node 
excitements.  However,  as  the  piediction  shows,  the  pattern  V— ^(Vp,  V.) 
is  still  in  the  outstar's  memory.  It  can  still  be  completely  recovered 
by  "pumping  up". 

This  result  does  not  mean  that  there  is  no  problem  with  random 
excitements  of  the  grid  nodes  in  an  out star  conforming  to  cC  .  It 
only  moans  that  it  is  not  a  significant  problem  in  the  out star  under 
study. 
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section  7.7     Generality  of  the  Formulation  of  the  z  Process 
Conforming  to  Logic  k   „ 

The  z  process  formulation  conforming  to  logic  <*_  that  we  have 
used  is: 

7.7.1  zci(t)  =  -nzci(t)  +  v  S„i(Mz  +  zci(t))  (a(xc(t  -T)  + 

x±(t))2  -b(xc(t  -r)  -  Xi(t))2) 

where : 

fl  if  y  *   0 

8  Ay)  ={ 

~x  [O   if  y  <  0 

As  was  shown  in  section  7*5*  setting  a  =  b  in  equation  7.7.1 

will  result  in: 

7.7.2  z  .(t)  «  -uz  .(t)  +  £  -(!!_  +  z  .(t))2v(a  +  b)x  (t  -r)x.(t) 

CJ.  CI  ~X        «      CI  C  X 

Equation  7*7.2  describes  a  z  process  conforming  to  the  neurt  rally 
biased  logic  X ^  of  table  7»3«1»  By  setting  Mz=  0  in  equation  7.7.2, 
we  get  the  excitory  logic  *0  of  table  7*3.1  which  has  been  the  logic 
we  have  used  in  the  simple  and  laterally  inhibiting  outstars.  Thus 
the  z  process  formulated  by  equation  7.7.1  is  rather  genreal.  By 
specifying  the  parameters  a,  b,  and  M  we  have  a  choice  of  which  logic 
and  what  type  of  out star  we  shall  get. 

The  general  application  of  equation  7.7«1  does  not  end  there. 
By  appropriate  specification  of  the  parameters  a,  b,  M  ,  and  v  we  can 

35 

make  a  z  process  governed  by  it  "practically  inhibitorally  biased". 

Suppose,  for  example,  that  we  wished  to  make  a  laterally  inhibiting 

out star.  We  connect  all  of  the  grid  nodes  with  directed  edges  and 

arrowheads.  Previously  we  have  used  z  processes  with  a  permanently 

assigned  negative  value  to  get  laterally  inhibiting  prediction  signals. 

However,  we  can  now  make  all  the  z  processes  in  the  network  conform 

I8H- 


to  equation  7t7.lt  By  proper  selection  of  a,  b,  and  v  we  can  make  the 
z  processes  in  the  laterally  inhibiting  arrowheads  negative  most  of 
the  time. 

To  do  so,  we  depend  on  the  statistics  of  the  environment.'  It  is 
unlikely  that  ant  two  x  processes  In  the  grid  will  be  excited  to 
identically  equal  amplitudes  for  very  many  times  in  succession. 
Therefore,  by  specifying  the  cross  over  factors  &        =  1/u"   =  1, 
we  can  be  almost  certain  that  the  z  processes  in  the  arrowheads  will 
leam  inhibition. 

An  experiment  was  conducted  to  test  this  conclusion.  Two  nodes, 
V  and  V  were  connected  by  a  directed  edge  as  shown  at  the  bottom  of 
figure  7.7*1 •  The  originating  node,  V. ,  was  excited  four  times  in 
succession  by  input  pulses.  The  "receiving"  node,  V  ,  was  excited 
twice  exactly  when  the  prediction  signal  arrived  at  the  arrowhead. 
The  parameters  used  in  the  experiment  were: 
Input  parameters: 

Input  pulse  shape  is  rectangular 

A  =  10 

o  -   0.3  seconds 
Network  parameters: 

All  initial  conditions  were  zero 

ct,=  3*3333  seconds" 

u  =  0.278  seconds 

T  -  0.3  seconds 

v  =  1.0 

a  =  0.12 
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Figure     7.7.1.     Demonstration  of  a  z  process  which  loams  inhibition. 


Network  parameters  continued: 
b  =  l.uo 


Mz  =  2.10 


*r 

From  the  selection  of  a  and  b,  the  cross  over  ratio  A   =  1/m  q" 
was  computed  to  be: 

K  =  2 

Figure  7t7ti  shox^s  the  result.  The  initial  excitement  of  V,  alone 
resulted  in  the  z^pCt)  process  being  driven  to  its  negative  limit, 
-M  .  The  two  presentations  of  event  2  exactly  at  the  time  that  the 
prediction  signal  x.  (t  -/0  arrived  at  the  arrowhead  resulted  in 
Z-.p(t)  being  driven  to  a  positive  amplitude.  However,  the  fourth 
excitement  of  V.  resulted  in  z,At)   returning  to  inhibitory  values. 
Thus  we  may  conclude  that  the  z.  (t)  process  wall  behave  as  an  inhibitory 
process  most  of  the  time.  Note  also  that  we  did  not  have  to  specify 
the  cross  over  factor  to  be  exactly  1  to  get  this  result. 

Of  course,  specifying  a  =  0  in  equation  7.7.1  would  make  the  z 
process  always  inhibitorally  biased.  The  above  experiment  was  conducted 
to  show  that  we  did  not  have  to  go  to  this  extreme  to  get  the  desired 
results. 

If  we  go  to  the  other  extreme  and  specify  b  =  0  in  equation  7»7»1| 
we  get: 
7.7.3    zci(t)  =  -uzci(t)  +  va5-jL(Mz  +  zci(t))(xc(t  -  *)  +  x^t))2 

This  formulation  will  result  in  the  z  process  being  driven  to 

positive  amplitudes  when  ever  x  (t  -?),  or  x.(t),  or  both,  are  non 

zero.  Thus  we  can  replace  the  permanently  assigned  positive  z  processes 

in  the  command  node  cascade  in  an  avalanche  with  "learning"  z  processes 

that  are  governed  by  the  same  general  equation  as  all  the  other  z 
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processes  ±n  the  avalanche. 

The  z  process  formulation  given  by  equation  7.7.1  is  therefore 
general  enough  to  be  used  in  all  the  applications  we  have  found  for 
z  processes  in  outstars  and  avalanches.  We  could  specify  that  all 
the  z  processes  in  a  network  be  governed  by  this  formulation,  The 
special  features  of  the  netx-jork  such  as  a  command  cascade  or  lateral 
inhibition  can  be  implemented  by  appropriate  selections  for  the 
parameters  a,  b,  and  M  ,  Thus  the  design  of  an  out star  or  an  avalanche 
could  be  reduced  to  specification  of  these  parameters  at  each  of  the 
arrowheads  in  the  network. 
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CHAPTER  8     THE  CHEMICAL  OUTSTAR 
section  8.1     Introduction 

At  this  point  there  are  three  outstanding  promises  made  in  the 
previous  chapters.   In  the  introduction  to  chapter  one,  it  was  promised 
that  this  thesis  would  examine  Grossberg's  theoretical  proposal  for 
the  neurophysiological  processes  that  allow  a  living  organism  to 
learn.  In  chapter  five  it  was  promised  that  a  solution  to  "pulse 
lengthening"  in  a  cascade  of  nodes  would  be  developed.  In  chapter 
seven  it  was  promised  that  an  examination  of  a  logic  corresponding  to 
logic  <Kq  in  table  7.^.1  would  be  made. 

We  shall  keep  these  promises  in  this  chapter.  A  synthesis  of 
all  three  will  be  developed  and  we  shall  examine  its  performance. 
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section  8.2     The  Analogy  Between  Embedding  Field  Networks  and  the 
Nervous  System  of  Living  Organisms 

Figure  8.2.1  shows  the  analogy  between  embedding  field  network 
elements  and  the  elements  of  the  nervous  system  of  a  living  organism. 
A  thorough  perusal  of  figure  8.2.1  would  explain  this  analogue  to  the 
reader  better  than  volumes  of  words. 

For  the  uninitiated,  a  brief  description  of  the  neurophysiological 
elements  and  processes  shown  in  figure  8.2.1  is  offered.  The  dark. 
cell  body  and  axon  shown  is  an  interneuron  in  the  spinal  column  of 
a  vertebrate.  The  light  cell  body  and  axon  is  a  motoneuron.  Neurons 
are  living  cells.  They  occur  in  organisms  in  a  variety  of  shapes. 
However,  they  always  consist  of  a  reasonably  elongated  part  called 
an  axon,  and  a  "fatter"  part  called  the  cell  body.  The  cell  body 
contains  the  cell's  nucleus.  An  interneuron  and  a  motoneuron  were 
chosen  for  figure  8,2,1  because  they  have  been  extensively  studied  and 
the  information  shown  was  easy  to  collect. 

The  traces  shown  are  voltages  recorded  by  microelectrodes  inserted 
into  the  interneuron  and  the  motoneuron  at  the  places  shown.  These 
recordings  correspond  to  the  following  sequences  of  events:  The 
interneuron  is  excited  by  an  electrical  signal  delivered  to  the  cell 
body  by  a  micrpelectrode.  This  signal  results  in  the  membrane  potential 
of  the  cell  body  rising  from  its  resting  potential  of  approximately 
-70  mV.  There  are  two  parts  to  this  positive  increase  in  the  cell 
body  membrane  potential:  The  excitory  post  synaptic  potential,  EPSP, 
and  the  action  potential  (spike).  The  EPSP  is  the  lower  trace  which 
is  shown  as  a  solid  line.   If  the  EPSP  does  not  rise  to  suprathreshold 
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values ,  then  it  is  the  only  signal  recorded  at  the  cell  body.  Further 
a  subthreshold  EPSP  does  not  result  in  an  action  potential  (spike) 
being  propagated  down  the  a son. 

When  the  EPSP  rises  to  suprathreshold  values,  a  spike  is  propa- 
gated down  the  axon.  In  addition,  the  spike  is  "reflected"  back  into 
the  cell  body  giving  rise  to  the  dotted  line  spike  trace  shown  super- 
imposed on  the  EPSP, 

The  spike  is  -formed  at  the  point  where  the  cell  body  narrows 
down  to  form  the  axon.  It  propagates  down  the  axon  at  a  finite  velo- 
city which  is  on  the  order  of  5  meters/sec.  to  100  meters/sec.  The 
type  of  neuron  and  the  covering  on  the  axon  determines  the  propagation 
velocity.   In  a  particular  type  of  neuron,  the  propagation  velocity 
is  fixed.  All  spikes  are  transmitted  at  the  same  velocity.  Spikes 
also  always  have  the  same  amplitude  and  shape. 

The  end  of  an  axon  generally  breaks  up  into  a  number  of  collaterals. 
Each  collateral  ends  in  a  swelled  portion  called  a  bouton.  These 
boutons  are  located  immediately  adjacent  to  another  neuron's  cell 
body.  The  bouton-cell  body  junction  is  called  a  synapse.  For  this 
reason  the  geometric  arrangement  of  the  neurons  shown  is  described 
as  an  interneuron  "synapsing"  on  a  motoneuron.  We  have  shown  the  spike 
propagated  down  the  axon  as  it  arrives  at  the  synapse.  Note  that  it 
is  delayed  due  to  the  finite  transmission  velovity. 

A  spike  arriving  at  a  synapse  causes  the  adjacent  cell  body 
membrane  potential  to  rise  from  its  resting  potential  vrith  an  EPSP, 
If  the  EPSP  rises  to  suprathreshold  values,  a  spike  is  propagated 
down  this  neuron's  axon. 
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There  is  a  short  delay  between  the  arrival  of  a  spike  at  the 
synapse  and  the  beginning  of  an  EPGP  at  the  adjacent  cell  body.  This 
is  because  the  cell  body  being  synapsed  upon  is  not  excited  electrically 
by  the  spike.  Instead,  the  spike  causes  the  release  of  a  chemical 
substance  in  the  space  between  the  bouton  and  adjacent  cell  body. 
This  chemical  substance  is  called  transmitter.  It  causes  the  EPSP 
in  the  synapsed  upon  cell  body  by  changing  the  cell  body's  peccability 
to  different  ionic  species. 

A  magnification  of  a  synapse  is  shown.  The  space  between  the  bouton 
and  the  cell  body  is  called  the  synaptic  cleft.  Under  an  electron 
microscope,  the  synaptic  cleft  is  revealed  to  hold  a  number  of  small 
particals  called  vesicles.   It  is  currently  believed  that  these  vesicles 
are  packages  of  transmitter  which  burst  open  when  a  spike  arrives  at 
the  synapse. 

The  reason  for  these  voltage  traces  is  relatively  easy  to  understand. 

* 

A  neuron  is  surrounded  by  an  interstitial  fulid  in  which  various  ions 
are  dissolved.  The  interior  of  a  neuron  is  also  a  fluid  like  substance 
in  which  ions  are  soluble.  The  boundary  between  the  interior  of  the 
neuron  and  the  interstitial  fulid  is  a  membrane  which  is  selectively 
permeable  to  ions.  In  a  neuron  at  rest,  the  membrane  is  permeable 
to  potassium  ions,  K+,  but  reasonably  impermeable  to  sodium  ions,  Na+, 
There  is  additionally  a  "sodium  pump"  in  the  membrane  which  continu- 
ously ejects  Na+  ions  from  the  neuron's  interior.  To  maintain  electrical 
and  chemical  equilibrium  of  the  overall  system,  there  is  a  higher 
concentration  of  K+  inside  the  neuron  than  outside.  The  reverse  is 
true  for  Na+.  The  result  is  that  the  interior  of  the  neuron  is  approxi- 
mately 70  milli  volts  negative  with  respect  to  the  interstitial  fluid, 
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Electrical  stimulation  of  the  membrane  results  in  a  sudden  change 
in  the  membrane  permeability,  The  membrane  becomes  permeable  to 
Na+  ions  and  they  diffuse  into  the  neuron.  This  results  in  a  sudden 
increase  in  the  voltage  of  the  neuron's  interior  with  respect  to  the 
interstitial  fluid.   In  a  very  short  time  the  membrane  regains  its 
impermeability  to  Na+  ions.  K-i-  ions  then  diffuse  out  of  the  neuron 
to  redress  the  equilibrium  and  the  potential  across  the  membrane  drops 
to  the  resting  potential.  The  net  effect  is  a  small  loss  of  K+  ions 
and  a  small  increase  of  Na+  ions  inside  the  neuron.  The  sodium  pump 
will  redress  this  In  short  time.  Thus  with  microelectrodes  inserted 
into  the  neuron  the  potential  across  the  membrane  can  be  measured 
and  electrical  traces  similar  to  those  shown  can  be  recorded. 

Release  of  the  trandmitter  substance  in  the  synaptic  cleft  by  a 
spike  causes  similar  membrane  permeability  changes  which  result  in 
an  EPSP. 

Next  to  the  neurons  we  have  shown  the  geometrical  elements  and 
processes  which  occur  in  embedding  field  elements.  Grossberg  has  pro- 
posed the  following  analogy  between  the  neurophysiological  phenomena 
in  an  organism  and  embedding  field  theory: 

Embedding  Field  Theory  Living  Organism 

Geometric  elements: 

node  cell  body 

directed  edge  axon 

arrowhead  synapse 

Processes: 

x  process  cell  body  membrane  potential 

prediction  signal  action  potential  (spike) 

amplitude  of  z  process  amount  of  transmitter 

substance  available 
in  synaptic  cleft 


Except  for  the  last  correspondence,  figure  8.2,1  shovrs  that  the 
analogy  is  in  general  very  good.  There  are  differences  in  detail  which 
we  will  take  the  time  to  explain  here. 

The  x  processes  shown  are  not  divided  into  an  EPSP  and  a  super- 
imposed spike.  Further,  the  maximum  amplitude  of  the  prediction  signal 
is  directly  proportional  to  the  amplitude  of  the  x  process  which,  in 
turn,  is  directly  proportional  to  the  amplitude  of  the  input  pulse  . 
The  amplitude  of  a  spike  on  an  axon  is  constant  and  independent  of  the 
amplitude  of  the  signal  exciting  the  cell  body. 

However,  the  situation  we  have  shown  on  the  interneuron  is  the 
response  to  a  single  excitation  of  short  duration  and  limited  amplitude. 
In  the  usual  case  the  EPSP  is  suprathreshold  for  a  reasonably  long 
time.  This  results  in  a  barrage  of  spikes  being  propagated  down  the 
axon.  The  frequency  of  these  spikes  is  proportional  to  the  strength 
of  the  stimulus  exciting  the  cell  body.  In  Grossberg's  proposal, 
the  amplitude  of  the  portion  of  the  x  process  that  is  suprathreshold 
is  considered  to  be  proportional  to  the  spiking  frequency  in  a  neuron. 
Thus  a  prediction  signal  represents  a  barrage  of  spikes. 
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section  8,3     Summary  of  the  Theoretical  Proposal  for  the 
Ncurophysiclogical  Process  of  Learning  in 
Living  Organisms 

We  have  seen  that  an  outstar  network  composed  of  embedding  field 
elements  is  capable  of  learning.  The  key  to  this  ability  is  the  z 
process  at  an  arrowhead.  The  z  process  at  an  arrowhead  correlates  the 
prediction  signal  arriving  at  the  arrowhead  with  the  x  process  at  the 
adjacent  node.  It  remembers  this  correlation  in  its  amplitude  and 
allows  prediction  signals  to  excite  the  adjacent  node  proportional  to 
its  amplitude.  By  writing  down  the  equations  governing  the  embedding 
field  network  shown  in  figure  8.2.1,  we  can  see  this  clearly: 

8.3.1  ^(t)  =  -ax  (t)  +  P1(t) 

8.3.2  x2(t)  -  -ccx2(t)  +  P2(t)  +pz12(t)[x  (t  -r  )  -Tx3+ 

8.3.3  z12(t)  =  -uz12(t)  +  vjx^t  -T)  -  Tx]+[x2(t)  -   rx]  + 
where:  ♦ 

[o  if  y  t  0 

From  the  xAt)   trace  is  figure  8.2,1,  we  can  conclude  that  z1?(t)  has 
already  learned  that  V  and  V  are  associated.  That  is,  z.?(t)  >  0  and 
is  of  sufficient  amplitude  to  result  in  a  well  learned  prediction  response 
byx2(t). 

In  order  for  the  intern euron  in  figure  8.2.1  to  excite  the  noto- 
neuron  with  spikes,  there  must  be  transmitter  substance  in  tho  synaptic 
clefts.  If  we  make  the  amount  of  transmitter  substance  released  by  a 
barrage  of  spikes  proportional  to  |3  z  (t)[x,  (t  -  *  )  -  T  ]  then  the 
equations  governing  the  embedding  field  network  could  accurately 
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describe  the  nervous  network.  If  we  further  made  the  amount  of  trans- 
mitter substance  available  for  release  proprotional  to  the  amplitude 
of  z-tpCt),  then  equation  8,3.3  could  describe  how,  why,  and  how  much 
transmitter  substance  is  available  in  the  synaptic  cleft.  Grossberg 
has  proposed  this  as  a  concrete  theoretical  explanation  of  the  neuro- 
physiclogical  phenomena  underlying  learning  in  living  organisms.  His 
proposal  is  that  transmitter  substance  is  produced  in  a  synaptic 
cleft  at  a  rate  proportional  to  the  correlation  of  the  frequency  of 
spikes  arriving  at  the  bouton  and  the  membrane  potential  and/or  spiking 
frequency  of  the  adjacent  cell  body.   He  has  proposed  additional  re- 
finements and  an  exact  mechanism  which  gives  this  result  in  reference 

It  is  doubtful  that  the  ability  of  an  interneuron  to  excite  a 
motoneuron  in  the  spinal  column  of  vertebrates  is  learned.  As  we  have 
said,  the  neurons  selected  for  figure  8.2,1  were  selected  because  of  the 
extensive  information  that  has  been  collected  on  them.  However,  the 
arrangement  of  neurons  in  the  meduHa ,  cerebellum,  and  cerebrum  of 
vertebrates  is  similar  and  we  do  know  that  learning  occurs  in  these 
organs.  The  similarity  between  the  embedding  field  network  and  the 
nervous  network  in  figure  8,2,1  is  uncanny.  Grossberg  has  shown 
theoretically,  and  we  have  shown  experimentally,  that  embedding  field 
networks  can  learn.  Thus  Grossberg' s  proposal  could  explain  learning 
in  organisms  at  the  microscopic  level.  The  proposal  is  even  more 
attractive  when  it  is  recalled  that  embedding  field  theory  originated 
at  a  model  for  the  macroscopic  psychological  phenomena  of  learning. 

This  thesis  originally  intended  to  simulate  Grossberg' s  proposal 
in  detail  and  compare  it  to  existing  neurophysiological  experimental 
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data.  However,  the  time  was  not  available.  A  simplistic  stab  was 
made  in  this  direction.  Ihe  reason  was  thai  nervous  networks  are 
capable  of  transmitting  a  signal  through  a  cascade'  of  neurons  without 
"pulse  lengthening"  occur ing.  To  solve  this  problem  in  an  embedding 
field  node  cascade,  an  attempt  was  made  to  model  the  embedding  field 
elements  more  closely  to  neurophysiological  elements.  At  the  same  time, 
attempts  to  implement  logic  <C  ~  of  table  7»^-»l  in  an  outstar  were  being 
made.  The  simplistic  model  of  neurophysiological  phenomena  proved  to 
be  an  £„  logic.  Because  of  these  diverse  reasons,  the  simplistic 
model  arrived  at  in  this  thesis  is  quite  different  from  Grossberg's 
proposal.  In  the  next  section  we  shall  derive  this  model  in  a  somewhat 
logical  manner.  The  reader  may  be  assured  that  this  was  not  the 
historical  progress  of  the  model. 
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section  8,4    A  Simplistic  Modol  for  the  Neurophysiological  Phenomena 
in  a  Norvous  Network  Based  on  Embedding  Field  Theory 

Suppose  that  we  had  two  neurons,  V.  and  V  ,  arranged  as  in  figure 

8.2,1.  Suppose  further  that  excitements  of  the  first  neuron,  V  , 

only  results  in  one  spike  being  generated  per  excitement.  Also 

suppose  that  we  could  excite  the  cell  body  of  the  second  neuron,  V  , 

with  an  input.  As  in  embedding  field  theory,  we  are  not  concerned 

here  with  how  these  inputs  are  delivered  to  the  cell  bodies.  For  the 

sake  of  argument,  suppose  that  transmitter  substance  is  produced  in 

the  synaptic  cleft  at  a  rate  proportional  to  the  correlation  between 

the  membrane  potential  of  a  bouton  and  the  membrane  potential  of  the 

adjacent  cell  body,  V  ,  For  the  purposes  of  this  discussion,  we  will 

assign  a  value  of  zero  to  the  resting  membrane  potential  at  the  bouton 

and  let  x  (t)  be  the  membrane  potential  of  the  V  cell  body.  Let 

z  (t)  be  the  "amount"  or  "concentration"  of  transmitter  substance 
12 

present  in  the  synaptic  clefte  From  our  previous  work  we  have  a  choice 

of  two  formulations  for  z,  (t): 

12 

8.4.1  z     (t)  =  -uz     (t)  +  vx  (t  -r)x  (t) 

12  12  jl  2 

and  the  more  general  formulation: 

8.4.2  ■     (t)  =  -uz     (t)  +  v  S,(Mb  +  zl2(t)) 

[a(x1(t  -r  )  +  x2(t))2  -  bCxjft  -r)  -  x2(t))2] 

,  we  run  into  a  problem,  x.  (t  -T)   and  x  (t)  are  voltages. 


Now 


z  (t)  is  the  rate  of  production  of  a  chemical  transmitter  substance, 
12 

What  are  the  chemical  reactants  which  produce  the  transmitter  substance? 
How  does  it  come  about  that  a  chemical  substance  is  being  produced  at 
a  rate  proportional  to  the  product  of  voltages? 
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In  our  brief  description  of  how  membrane  potentials  come  about, 
we  saw  that  these  potentials  are  due  to  changes  in  the  ionic  perme- 
ability of  the  neurons'  membranes.  Suppose  that  ah  ion  or  substance 
diffuses  or  is  released  from  the  bouton  when  the  membrane  permeability 
is  changed  by  arrival  of  a  spike.  We  will  call  this  substance  "B" 
substance.  Suppose  further  that  a  different  ion  or  substance  diffuses 
or  is  released  from  the  coll  body  when  its  membrane  permeability  is 
changed  by  an  EPSP  or  spike.  We  will  call  this  substance  "C"  substance. 
Suppose  further  that  "B"  substance  and  "C"  substance  are  the  reactants 
which  produce  the  transmitter  substance.  Since  the  transmitter  substance 
results  in  excitation  of  neuron  V  ,  we  will  call  it  "excitory  transmitter 
substance" ,  or  simply  "E"  substance. 

How  do  the  B  and  C  substances  combine  to  produce  E  substance, 
and  why  would  the  rate  for  this  reaction  be  proportional  to  the  product 
of  voltages?  The  rate  of  reaction  for  biochemical  reactions  may  be 
governed  by  many  things,  including  voltages .  Due  to  the  complexities 
of  biochemical  processes,  we  could  blatantly  assume  that  the  rate 
of  reaction  for  the  combination  of  B  and  C  substances  into  E  substance 
is  proportional  to  the  product,  or  the  squares  of  the  sum  and  difference, 
of  two  voltages.  However,  we  need  not  make  this  blatant  assumption. 
It  is  possible  to  allow  B  and  C  substances  to  combine  according  to  a 
very  simple  chemical  reaction  and  this  will  result  in  all  the  desired 
properties  for  production  of  E  substance.  The  remainder  of  this  section 
will  be  devoted  to  this  simple  chemical  reaction  and  its  implications, 

Let  B  and  C  substances  combine  to  form  E  substance  according  to 

the  chemical  reaction: 

8.^.3    b-B  +  c-C  ^  1-E 
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where  b  is  the  number  of  moles  of  B  and  c  is  the  number  of  moles 
of  C  required  to  produce  ono  mole  of  E, 

Let  this  reaction  occur  instantly  at  body  temperatures.  That  is, 
if  b  moles  of  B  and  c  moles  of  C  are  released  into  the  synaptic  cleft 
at  time  t~,  then  at  any  time  t>  t  only  the  end  product  of  ono  mole 
of  E  will  be  present  in  the  cleft. 

We  will  investigate  the  implications  of  equation  8,4,3  fcs  the 
production  of  E  substance.  The  investigation  will  involve  a  number 
of  tricky  conservation  of  reactants  and  end  product  equations.  For 
simplicity  ,  we  will  make  b  =  c  -  1  in  equation  8,4, 3.  That  is: 
8.4,4    i-B  +  i-C  ^  i-E 

Equation  8,4,4,  will  be  used  throughout.  However  it  must  be  kept 
in  mind  that  equation  8,4,3  is  the  general  situation  and  that  we  will 
be  investigating  a  special  case. 

Let  b.At)   be  the  number  of  moles  of  B  substance  released  from 

a  bouton  into  the  synaptic  cleft  per  second.  Let  c1?(t)  be  the  number 

of  moles  of  C  substance  released  from  the  cell  body  per  second  into 

the  cleft.  We  can  relate  b  (t)  and  c  (t)  to  the  membrane  potentials, 

12        12 

x.  (t  -f)  and  x_(t), 
1  2 

The  biochemical  process  which  results  in  membrane  potentials  is 
the  selective  permeability  of  the  membranes,  A  positive  increase  in 
a  membrane  potential  is  due  to  an  increase  in  the  membrane's  permea- 
bility to  sodium  ions,  Na*.  A  decrease  in  membrane  potential  is  due 
to  a  decreased  permeability  to  Na+  ions.  As  we  discussed  in  section 
8,2,  the  net  effect  of  a  spike  or  an  EPSP  on  a  neuron  is  a  slight 
increase  of  Na+  ions  inside  it  and  a  compensating  decrease  of  potassium, 

K+,  Nov;,  suppose  that  B  and  C  substances  are  held  inside  the  membrane 

Z0\ 


when  It  is  at  rest  potential.  Suppose  further  that  they  diffuse 
through  the  membrane  with  K*  ions  to  compensate  for  a  net  increase  of 
Na+  ions.  Since  K+  ions  diffuse  out  of  a  membrane  when  the  membrane 
potential  is  decreasing,  we  can  say  that:  .  • 

(a)  b  (t)  >  0    when  x^t  -  t   )  <  .0 

(b)  c  (t)  >  0    when  x2(t)  <  0 

Since  the  rato  of  diffusion  of  K+  ions  is  proportional  tc  Ihe  rate 

of  change  of  membrane  potential,  let  us  go  a  bit  further  and  say  that: 

fc)    b  (t)  =  C-x  (t  -*)3+ 
12        1 

(d)     c  ft)  =  C-x (t)]+ 
12        2 

where : 

+   fy  if  y  >  0 

10  if  y  ^  0 

In  other  words,  this  says  that  the  rate  of  release  of  B  and  C 
into  the  synaptic  cleft  is  directly  proportional  to  the  rate  of  decrease 
of  membrane  potential,  »• 

Now,  what  happens  to  the  B  and  C  substance  when  they  are  released 
into  the  synaptic  cleft?  If  both  are  being  released  at  the  same  time, 
then  E  substance  will  be  produced.  This  exactly  what  we  want.  It 
says  that  E  substance  will  be  procuded  if  both  Xu  (t  -X  )  and  x  (t) 
are  decreasing  at  the  same  time.  Although  it  ignores  the  increasing 
leading  edge  of  x,  (t  -If)  and  x?(t),  it  does  correlate  the  decreasing 
trailing  edges.  Further,  this  process  corresponds  to  known  physical 
facts.  That  is,  when  membrane  potentials  are  decreasing,  at  least  one 
substance  from  inside  the  membrane  in  diffusing  out  of  it. 

However,  there  is  a  catch.  Suppose  a  spike  has  excited  the  bouton 
recently,  but  no  EPSP  or  spike  has^  excited  the  adjacent  cell  body, 
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Then  B  substance  will  have  been  released  into  the  synaptic  cleft  and 
there  will  be  a  net  amount  of  it  present  for  all  time  after  arrival 
of  the  spike  at  the  bouton.  Thus,  if  a   few  days  later,  the  adjacent 
cell  body  is  excited  by  an  EPSP  or  a  spike,  E  substance  will  be  pro- 
duced. In  embedding  field  terms,  the  association  V  -*-  V  will  be 
learned.  One  of  the  key  tenets  of  embedding  field  theory  is  that 

V-- -*  V  can  only  be  learned  when  V.  and  V  have  been  excited  in  close 
1    2  i      2 

temporal  proximity.  Thus,  we  can  not  allow  excess  B  or  C  substances 
to  accumulate  in  the  cleft. 

There  are  three  methods  of  preventing  excess  B  or  C  substance  from 
accumulating  in  the  cleft ■  It  can  diffuse  out  of  the  cleft,  it  can  be 
readsorbed  into  the  bouton  or  cell  body  from  which  it  came,  or,  it 
can  be  rendered  inactive  by  chemical  reactions.  There  is  no  reason  for 
prefering  one  of  these  methods  to  another  here.  We  will  arbitrarily 

choose  the  chemical  reaction  and  say  that  B  and  C  substances  are  de- 

» 

activated  at  a  finite  rate  to  prevent  accumulation. 

Let  b  (t)  be  the  number  of  moles  of  B  in  the  cleft  at  time  t  , 
12 

Let  c  (t)  be  the  number  of  moles  of  C  in  the  cleft  at  time  t  • 

12 
Then  we  will  say  that: 

* 

8.4.5  b  (t)  =  -w,bio(t) 

8.4.6  t  (t)  =  -wccl2(t) 

We  now  have  a  "correlating"  process.  The  amount  of  E  substance 

in  the  cleft,  z  (t),  will  grow  when  a  spike  excites  the  bouton  in 

12 
close  temporal  proximity  to  the  excitement  of  the  adjacent  cell  body. 

It  will  not  grow  if  they  are  not  in  close  temporal  proximity. 

We  must  now  develop  a  mathematical  description  of  the  production 

of  E  substance  in  the  cleft  as  a  function  of  the  membrane  potentials 

203 


x  (t)  and  x  (t  -r).  Thus  far  we  have  reached  the  following  results: 

8.4.7  i  •  B  +  1-  C  ^  1  •  E     (instantaneous  rate) 

8.4.8  b  (t)  =  C-x  (t  -t)]  + 

8.4.9  6  (t)  =  [  -x_(t)  1  + 
.12        2 

8.4.10  b,o(t)  =  -wjb  (t) 

8.4.11  c,  (t)  =  -w  c  (t) 

12       e  i2 

where : 

x  (t  -f )  is  the  membrane  potential  of  the  bouton. 
1 

x  (t)  is  the  membrane  potential  of  the  adjacent  cell  body. 

b  (t)  is  the  number  of  moles  of  B  released  into  the  cleft  from  the 

bouton  per  second, 
c  (t)  is  the  number  of  moles  of  C  released  into  the  cleft  from  the 

cell  body  per  second. 

b  (t)  is  the  net  number  of  moles  of  B  in  the  cleft  at  time  t  . 

c  (t)  is  the  number  of  moles  vf  C  in  the  cleft  at  time  t  • 
12 

Because  of  8,4.7,  either  b.  (t)  or  c  (t)  is  zero  at  any  given 
time.  Also  because  of  8.4.7: 

8.4.12  i  (t)  =[min(blQ(t),  c"  (t))]  + 

12  12     12 

where : 


+    fx  if  x  f:  y  and  x  >  0 

[rain(x,  y)3  ~\y   if  y-  x  and  y>  0 

\0  if  x  -  OORyi  0 

This  simply  says  that  if  there  is  b  (t)  of  B  substance  in  the  cleft, 

and  we  release  c  (t)  <  b  (t)  of  C  substance  into  the  cleft,  then 

12      12 
instantaneously  all  of  the  C  will  be  used  up  to  produce  E,  z  (t)  is 

restricted  to  be  postive  because  there  simply  can  not  be  a  negative 

number  of  moles  of  B  or  C  in  the  cleft. 
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Equation  8,4,12  will  describe  z.At)   as  a  function  of  x.  (t  -T) 

and  x  (t)  if  we  can  develop  equations  relating  b  (t)  end  c~  (t)  to 

x  (t  ~T)  and  x  (t)  respectively.  Let  us  consider  the  conservation 
1  *- 

of  B  in  the  cleft: 

(i)  b  (t)  =  [-x,  (t  -t)~\    dt  of  B  substance  is  roleased  into 
12        ^ 

the  cleft  per  time  interval  dt. 

—  —  »4* 

(ii)   [  min(b.,  ?(t),  c  (t))l  dt  of  B  substance  is  converted  to 

E  substance  instantaneously  in 
time  interval  dt, 
(iii)  The  reaction  to  produce  E  substance  is  instantaneous. 

Therefore  if  there  is  b, ?(t)  of  B  substance  in  the  cleft 
and  c.p(t)  of  C  is  added  at  t  =  tQ,  then  at  t  >  t  there 
can  be  at  most  t  b,  (t)  -  c  (t)]   of  B  in  the  cleft. 
This  is  the  amount  of  B  substance  which  will  be  available 
for  deactivation.  Therefore  there  is: 
-Wj>[bl2(t)  -  cl2(t)]  +dt 
of  B  substance  deactivated  per  time  interval  dt. 
Therefore: 

8.4.13  b  (t)  =L-Xl(t  -r)]  +  -  wbtb12(t)  -  cl2(t)]  +  - 

[min(bl2(t)f   cl2(t))]  + 
Siriiilarly: 

8.4.14  c.    (t)  =  [-x  (t)]  +  -  w  [c     (t)  -  b.    (t)]     -  [min(b.o(t),   c.Jt)]* 

12  2  c     12  12  J-2  12 

Equations  8, 4.13  and  8.4.14  coupled  with  8,4,12  completely  describe 
the  process  whereby  the  voltages  x  (t  -  X )  and  x  (t)  are  converted  into 
the  chemical  substance  E,  As  they  are  rather  complicated,  a  system 
diagram  was  drawn  and  is  shown  in  figure  8,4,1, 
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A  signal  from  one  neuron  is  transmitted  to  another  by  the  release 
of  transmitter  substance  in  the  synaptic  cleft.  Having  developed  a 
model  for  the  production  of  transmitter  substance,  we  must  not  model 
how  this  substance  is  used  in  the  transmission  of  signals.  Let  us 
assume  that  the  transmitter  substance  produced  by  our  reaction  is  con- 
tained in  the  vessicles  in  the  synaptic  cleft.  Under  normal  circum- 
stances, it  is  safely  packaged  in  these  vessicles  and  unable  to  affect 
the  permeability  of  the  adjacent  coll  body  membrane.  However,  when 
a  spike  arrives  at  the  bouton,  the  vessicles  suddenly  burst  and  the 
transmitter  is  released  to  attack  the  cell  body  membrane.  Hov;  does 
the  spike  cause  the  vessicles  to  burst? 

Again  since  we  are  dealing  with  a  biochemical  system,  there  is  no 
obvious  method.  Let  us  consider  the  events  associated  with  the  arrival 
of  the  spike  at  the  bouton  and  see  if  there  is  any  reason  for  the 
vessicles  to  burst.  Arrival  of  the  spike  at  the  bouton  begins  with  a 
rapid  diffusion  of  Na+  ions  into  the  bouton.  Here  we  have  two  possible 
reasons  for  the  vessicles  to  burst.  Firstly,  before  the  arrival  of 
the  spike,  the  bouton  and  the  cell  body  are  at  zero  potential  to  one 
another.  When  the  spike  begins  to  arrive  et  the  bouton  the  potential 
of  the  bouton  rapidly  increases  relative  to  the  potential  of  the  cell 
body.  Thus  we  could  conceive  of  the  vessicles  being  pulled  apart  by 
electrostatic  forces.  This  would  require  dipolar  vessicles.  One  end 
of  the  vessicle  would  have  to  be  at  a  different  potential  with  respect 
to  the  other  end.  If  transmitter  were  released  by  this  method,  then  it 
would  most  likely  be  released  before  the  spike  peaks. 

On  the  other  hand,  we  could  conceive  of  the  vessicles  bursting 

due  to  the  sudden  infusion  of  Na+  into  the  bouton,  The  detailed 
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mechanism  would  require  that  the  normal  Na+  concentration  in  the  syn- 
aptic' cleft  be  greater  than  that  inside  the  bouton  as  is  the  case 
with  the  interstitial  fluid  surrounding  the  neuron.  Then  the  beginning 
of  the  arrival  of  a  spike  at  the  bout on  would  cause  the  Na+  to. diffuse 
out  of  the  cleft  into  the  bouton.  Since  the  volume  of  the  cleft  is 
small  compared  to  that  of  the  bouton,  this  process  would  rapidly  deplete 
the  cleft  of  Na+.  If  sodium  is  required  to  keep  the  vessicle  together 
they  would  come  apart  when  a  spike  arrives  at  the  bouton.  Another 
mechanism  that  would  have  the  same  result  would  be  to  surround  the 
vessicles  with  a  membrane  that  is  permeable  to  Na+  and  R"20.  Then  the 
sudden  depletion  of  Na+  in  the  cleft  would  also  deplete  the  vessicles 
of  Na+.  The  result  would  be  an  osmotically  compensating  insurge  of 
H  0  into  the  vessicles.  With  sufficient  Na+  depletion,  enough  h^O  will 
enter  the  vessicles  to  burst  them  similarly  to  hemolysis  in  red  blood 
cells.   (Ref.  12   ,  p. 13)  Again  this  method  would  release  transmitter 
most  likely  before  the  spike  peaks. 

We  could  conceive  of  other  mechanisms  to  cause  vessicles  to  burst. 
However,  we  have  two  likely  candidates  which  cause  them  to  burst  before 
the  spike  peaks.  Our  process  for  the  production  of  transmitter  begins 
to  operate  after  the  spike  has  peaked  and  begun  to  decay. 

If  the  process  which  releases  transmitter  operates  at  the  same 

time  as  the  production  process,  we  will  be  releasing  the  transmitter 

that  we  produce.  Thus,  to  make  our  system  work  well,  we  must  separate 

the  transmitter  release  and  production  process.  For  this  practical 

reason,  and  the  fact  that  it  could  work,  we  will  release  transmitter 

when  the  bouton  membrane  potential  is  increasing.  That  is,  transmitter 

will  be  released  when:       x^(t  -  f  )  >  0 
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We  must  now  decide  how  much  transmitter  is  released.  For  simplicity 
let  lis  assume  that  all  the  transmitter  in  the  cleft  is  released  when 
the  bouton  membrane  potential  begins  to  increase.  We  will  further 
assume  that  all  the  released  transmitter  immediately  changes  the  per- 
meability of  the  adjacent  cell  "body  membrane  and  results  in  an  immed- 
iate increase  in  the  cell  body's  membrane  potential.  Note  that  this 
implies  that  arrival  of  the  spike  at  tho  bouton  causes  an  Impulsive 
excitement  of  the  adjacent  cell  body. 

We  need  to  decide  one  further  thing.  Release  of  one  mole  of  E 
substance  will  result  in  a  cell  membrane  potential  of  how  many  volts? 
We  will  arbitrarily  sat  that  release  of  one  mole  of  E  will  result 
in  a  cell  body  membrane  potential  increase  of  a  volts. 

In  summary,  our  transmitter  releasing  process  does  the  following: 
Suppose  that  there  is  z  (t)  moles  of  E  present  in  the  cleft.  Then  any 
increase  of  the  bouton  membra"*  potential  above  resting  potential  will 
cause  the  release  of  %lz(t)   moles  of  E. '  This  will  in  turn  cause  an 
immediate  increase  in  the  adjacent  cell  body  membrane  potential  of 

az  (t)  volts.  The  E  released  is  used  up  causing  the  x  (t)  membrane 

12  c 

potential  to  increase.  Thus  ^(t)  =  0  immediately  after  release  of 

the  E. 

In  order  to  use  this  simplistic  model  for  the  production  and  release 
of  transmitter' in  the  synaptic  cleft,  we  must  also  model  the  membrane 
potential  responses  of  cell  bodies  and  axons.  At  the  beginning  of  the 
modeling  process,  we  said  that  we  were  only  interested  in  the  propagation 
of  a  single  spike  across  the  synaptic  cleft.  Our  model  for  the  membrane 
response  at  other  parts  of  the  system  thus  need  only  account  for  a  single 

spike.  Rather  than  going  through  tho  laborious  process  of  finding 
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processes  which  will  exactly  duplicate  the  membrnae  potential  traces 
shown  in  figure  8.2.1,  wu  will  adopt  the  formulation  for  x  processes 
at  a  node  in  an  embedding  field.  Further,  we  will  not  consider  thresholds 
in  this  study. 

With  these  assumptions,  suppositions,  and  modeling  results,  we 
are  in  a  position  to  write  down  a  complete  set  of  equations  governing 
this  simplistic  model  for  a  nervous  network.  We  will  summarise  the 
notations  used  and  then  write  down  the  equations. 

The  equations  amd  notations  will  be  presented  in  a  generalised 
form.  Since  this  is  just  a  reformulation  of  the  embedding  field  net- 
work equations,  we  will  number  the  cell  bodies  in  a  nervous  network 
and  refer  to  them  as  the  "V."  cell  body.  All  synapses  between  boutons 
connocted  to  the  V.  cell  body  by  axons  and  the  V .  cell  body  will  be 
referred  to  by  the  dual  subscript  ij.  The  first,  i,  subscript  shows 
the  direction  a  signal  is  coming  from  and  the  second  subscript  shows 
the  direction  it  is  traveling  toward  across  the  synapse. 
Chemistry: 

B  substance  is  a  chemical  substance  released  from  a  bouton  into 
a  synaptic  cleft  when  the  bouton' s  membrane  potential  is  decreasing. 

C  substance  is  a  chemical  substance  different  from  B  substance 
which  is  released  from  the  cell  body  into  synaptic  clefts  when  the  cell 
body  membrane  potential  is  decreasing. 

E  substance  is  excitory  transmitter  substance.  It  is  produced 
by  the  instantaneous  reaction: 

i«B  +  1-C  £  1-E 

At  all  times  when  the  bouton  membrane  potential  is  at  resting  potential 

or  decreasing,  the  E  substance  is  stored  in  the  synaptic  cleft  and  is 
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unable  to  affect  membrane  potentials.  When  the  bouton  membrane  potential 
is  increasing,  all  the  E  substance  in  the  cleft  is  immediately  released. 
When  it  is  released  it  immediately  caused  an  increase  in  the  adjacent 
cell  body  membrane  potential  of  a  volts  per  mole  of  E  substance  released. 
The  E  substance  releasee  is  used  up  causing  the  cell  body  membrane 
potential  to  increase. 
Variables: 

P. (t)  is  an  input  signal  delivered  directly  to  the  V  cell  body 
from  the  environment. 

x.(t)  is  the  cell  body  membrane  potential  of  the  V^  cell  body 
in  the  nervous  network. 

x.(t  -  T)  is  the  membrane  potential  of  the  boutons  connected  to 
the  V.  cell  body  by  axons. 

z. .(t)  is  the  number  of  moles  of  E  substance  present  in  the  synaptic 

cleft  between  boutons  connected  to  the  V.  cell  body  by  axo^s  «.nd  the 

*■ 
V.  cell  body. 

b.  .(t)  is  the  net  amount  of  B  substance  in  moles  in  the  ij  synaptic 
cleft  at  time  t. 

Cjj(t)  is  the  net  amount  of  C  substance  in  moles  in  the  ij  synaptic 
cleft  at  time  t. 
Constants; 

OL  is  the  decay  rate  for  membrane  potentials, 

■w^  is  the  deactivation  rate  for  B  substance  in  a  synaptic  cleft. 

w     is  the  deactivation  rate  for  C  substance  in  a  synaptic  cleft, 

a  is  the  released  transmitter  effectiveness  factor  on  a  cell 

body  membrane  potential.     One  mole  of  E  substance  released  in  a  synaptic 

cleft  results  in  an  increase  in  the  adjacent  cell  body's  membmae 
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potential  of  a  volts. 

T    is  trie  interval  between  origination  of  a  spike  ct  a  coll  body 
and  its  arrival  at  the  boutons  attached  to  that  cell  body  by  axons. 

The  equations  governing  the  system's  performance: 

8.4.15  x,(t)  =  -«x.(t)  -I-  P.(t)  +  a£R(x.(t  -r))*     (t) 

1  i  i  .:         J  Ji 

where : 

R(x.(t  -T))z,,(t) 

is  a  special  function  defined  by: 

(o  if  x.(t  -r)±  o 
R(x  (t  -r))z..(t)  =1     J 

■*        ^-L      Ian  impulse  of  amplitude  z  ...  (t)  when  x.(t  -  ? ) 

>  0 

8.4.16  z..(t)  =  [min(b..(t),  e,.(t))]+  -  R(x.(t  -t))z..(t) 

where : 

+      f  x  if  x  £  y  and  x  >   0 
[min(x,  y)]     =\  y  if  y  -  x  and  y  >    0 
Oifx<0ory<0 

8.4.17  t     (t)  =  [-x  (t  -r)]+  -  wbtbjjL(t)  -  cj:.(t)]+  - 

[min(b     (t),   c..(t))]+ 
where : 

^0  if  y  t   0 

8.4.18  c     (t)  =  [-x  (t)]+  -  w  [c     (t)   -  b     (t)]     -  [min(b     (t), 

Vt)]  + 
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section  8,5     Experiments  with  the  Simplistic  Neurophysiological 
Model 

Equations  8.4-.  15  through  8.4-. 18  look  formidable.  They  were 
simulated  on  a  digital  computer  and  it  was  experimentally  verified 
that  they  work,  A  simple  network  consisting  of  one  neuron,  V. ,  synap- 
sing  on  another,  VV>,  was  used.  Since  the  method  for  the  excitement  of 
a  cell  body  by  transmitter  substance  is  an  impulse,  the  external 
inputs  P..(t)  and  F^^  were  specified  to  be  impulses  of  amplitude 
10,  The  remaining  parameters  were  selected  arbitrarily  to  be: 

*«  3.3333  sec.'1 

f  =  0.3  sec. 

w,  =  oo 
b 

wc=  0.5 
a  =  1.0 

Figure  8,5.1  shows  the  results.  The  impulse  input  V At)   was 
presented  to  cell  body  V  exactly  at  the  instant  that  the  first  spike 
x.(t  -f)  arrived  at  the  1,2  synapse.  Thus  the  signals  x1  (t  -f )  and 
Xp(t)  exactly  correlated.  Therefore  the  amount  of  B  substance  entering 
the  cleft  per  second  was  exactly  equal  to  the  amount  of  C  substance. 
Thus  all  the  B  and  C  substance  was  used  up  instantly  to  produce  E  as 
is  shown  by  the  zero  b.?(t)  and  c.  (t)  traces.  The  amount  of  E  produced 
was  exactly  enough  to  cause  x,  (t  -r)  to  exactly  correlate  with  x  (t) 
on  the  second  response.  Again  all  B  and  C  was  used  up  producing  E 
and  the  amount  of  E  produced  was  the  same  as  before. 

Since  all  the  B  and  C  substance  was  used  up  instantly  to  produce 
E,  we  can  analytically  compute  the  traces  in  figure  8.5.1.  The  response 
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of  V     to  an   impulse  is: 

xAt)  =  iOc"art   "  0,1J  for  ti    0.1    sec. 

Allowing  for  the  transmission  delay  between  cell  body  v  and  the  bouton, 
the  bouton  membrane  potential  is: 

x At  -?)  =  10e"dC  t"°'/':i  for  ti0.4  sec. 

The  response  of  V0  to  the  impulse  P0(t)   is: 


X2 


2  w  -^r—  -2 

(t)  =  iOe-*^"0^34"      for  t  £  0.'*  sec. 


The  amount  of  B  entering  the  synaptic  cleft  is: 

bl2(t)  =  C-x^t  -T)]  +  =  10«e-^t-0.)^      for  tt   o.4  sec# 

similarly,  the  amount  of  C  entering  the  cleft  is: 

+ 

c     (t)  =  [-x  (t)J+  =  lOcce"*^"  for  ti  0.4  sec. 

Since  all  the  B  and  C  entering  the  cleft  is  instantly  used  up 
producing  E, 

ifcft)  =  b12(t)  =  c12(t> 

or: 

/jLv       r"1      4rt      -  ottt-0.43*        >'    ,.         -«ct-o.w+  N 
>(t)  =  10«c  dt  ss  10(1   -  e  ) 

which  is  exactly  what  figure  8.5tl  shows. 
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The  second  x  (t)  response  is  due  to  release  of  E  by  the  sudden 

increasing  leading  edge  of  x. (t  - T ) .  This  results  in  the  instantaneous 

release  of  all  E  in  the  cleft  as  is  shown  by  the  z  trace.  From  equation 

8, '.'-.15,  the  release  of  the  E  results  in  an  instantaneous  increase  in  the 

amplitude  of  x  (t)  to  a  value  of  az,  (t).  As  a  =  1,0  and  z.  (t)  =  10. 5f 

xp(t)  suddenly  jumps  to  a  value  of  10.5  as  shown.  When  the  sharp 

increasing  leading  edge  of  x  (t  -  t )  is  over,  no  more  transmitter  is 

released  and  the  production  process  begins  to  produce  E  substance. 

The  amplitudes  are  the  same  as  in  the  first  response  and  the  same 

.  amount  of  E  is  produced  again.  As  long  as  the  amplitude  of  the  impluse 
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exciting  V  is  kept  at  a  value  of  10,  the  same  traces  will  be  produced 
for  as  many  excitements  of  V^as  we  desire.  We  will  analytically 
prove  this  statement  shortly. 

If  we  consider  the  traces  x  (t),  x^t  -?),  and  xr/t)   to  be  spikes, 
then  the  assumption  that  the  input  impulses  amplitudes  will  remain 
constant  is  realistic.  Spikes  are  always  of  the  same  amplitude  and 
duration  in  a  particular  species  of  neurons.  Note  that  once  the 
transmitter  substance  was  formed,  arrival  of  the  spike  x^t  -  T)   at 
the  bouton  had  the  same  effect  as  an  input  impulse  on  x2(t).  Thus  we 
may  consider  that  our  input  impulses,  P^t)  and  P^t)  are  the  effects 
of  spikes  arriving  at  boutons  synapsing  on  ^  and  V£  which  already 
have  10  molar  units  of  E  substance  present  in  their  synaptic  clefts. 

Figure  8.5.1  shows  the  result  of  the  special  case  of  a  spike 
arriving  at  the  1,2  synapse  at  exactly  the  same  time  that  V^^  is  excited 
by  an  input  impulse.  To  check  the  ability  of  these  networks  to  learn 
when  the  input  impulse  to  a  cell  body  is  delivered  at  a  time  different 
from  the  instant  that  a  spike  arrives  at  the  synapse,  another  experiment 
was  performed.  One  cell,  V  ,  was  arranged  so  that  it  synapses  on 
5  other  cell  bodies  in  an  outstar  arrangement.  The  parameters  in  the 
network  were  kept  the  same  as  in  the  previous  experiment.  Figure  8.5.2 
shows  the  arrangement  of  the  neurons  and  the  results. 

The  amount  of  B  substance  in  the  clefts,  b  (t),  was  zero  at  all 
times.  This  is  because  the  deactivation  rate  for  B,  wb,  was  infinite. 
In  the  simulation,  it  was  considered  that  the  amount  of  B  entering 
the  cleft  in  an  infinitesimal  time  interval,  dt,  was  made  available 
to  react  with  any  C  present  to  form  E.  If  there  was  any  B  left  over 

after  this  reaction,  it  was  immediately  deactivated  before  any  more 
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Figure  £.5.2.   A  more  complex  exreriment  v.ith  the  simplistic 
neurophysiologies!  model. 


B  entered  the  cleft  in  the  next  infinitesimal  time  interval. 

Nevertheless,  figure  8,5.2  is  a  good  look  at  the  processes  going 

on  in  this  model.  The  c  j  (t)  and  c  ?(t)  traces  show  the  instantaneous- 

ness  of  the  E  production  reaction,  V  was  excited  by  an  input  impulse 

before  the  spike  from  V  arrived  at  the  boutons.  Thus  C  was  released 

c 

into  the  c,i  synaptic  cleft  and  began  to  be  deactivated.  When  the 
spike  arrived  at  the  c,l  bouton,  B  was  released  into  the  cleft ,  Since 
there  was  more  B  being  released  into  the  cleft  than  there  was  C  present 
in  the  cleft,  all  the  C  was  instantly  used  up  producing  E,  Thus 
c  .(t)  suddenly  drops  to  zero  when  the  spike  arrives  at  the  c,l 
bouton  at  t  =  0,^-,  However,  enough  of  the  C  released  by  V.  had  already 
been  deactivated  when  the  spike  arrived  to  allow  z  ,.(t)  to  rise  to 
a  value  of  only  5» 

The  traces  associated  with  V  are  exactly  the  same  as  those  associ- 
ated  with  V_  in  the  previous  experiment.  The  spike  arrived  at  the 
c,2  bouton  at  exactly  the  same  instant  that  the  P?(t)  input  impulse 
was  delivered  to  V  .  Thus  all  the  C  and  B  released  was  used  up  producing 
E. 

The  traces  associated  with  V~,  V^  ,  and  V  show  what  happened  when 
the  input  impulses  are  delivered  to  the  cell  bodies  after  arrival  of 
the  spike  at  the  boutons.  Because  of  the  infinite  deactivation  rate 
for  B,  there  was  no  accumulation  of  B  in  the  cleft.  Thus  only  the  amount 
of  B  entering  the  cleft  when  these  cell  bodies  were  escited  is  available 
for  reaction  with  C  to  form  E.  Remember  that  the  B  entering  the 
cleft  is: 

b  (t)  =  [-xc(t  -r)]  + 

2/8 


and  the  C  entering  the  cleft  is: 

cc3(t)  =-[-x3(t)]  + 

V  was  excited  bv  the  input  impulse  Po(t)  at  time  t  =  0,5.  The  spike 
3 

arrived  at  the  bouton  at  t  =  0j4.   Because  of  the  infinite  deactivation 
rate  for  B,  all  the  B  which  entered  the  cleft  before  t  =  0.5  was 
deactivated  instantly.  Thus  the  B  available  for  reaction  with  the  C 
which  begins  to  enter  the  cleft  after  t  =  0.5  is: 

bc  (t)  =  [-ic(t  -r)]+  =  lOAe"*0-1.  "*-0.5]+  for  t  ±  0i5 
The  C  which. enters  the  cleft  after  t  =  0.5  is; 

c  ft)  =  [-L(t)]  *  =  10«([e"^t"0'^+  for  t  ±   0.5  sec. 

c3       3 

0.1* 
Thus j  the  amount  of  C  entering  the  cleft  is  a  factor  of  e 

greater  than  the  B  entering  the  cleft.  The  reaction  1-B  +  1*C^  1-E 

is  instantaneous  and  the  coeficionts  of  unity  mean  that  \   min(b  (t), 

c3 

c  o(t)  J   of  B  is  converted  to  E  immediately  upon  entering  the 

cleft.  Since  b  „(t)  is  less  than  c  „(t),  all  of  the  B  is  converted 
c3  c3  ( 

to  E.   Knowing  this,  we  can  analytically  compute  the  amount  of  E 
produced i 

z  (t)  =  (_min(b  (t),  c  (t))]  +  =  b  (t) 

This  last  conclusion  is  a  technical  point.  Since  all  the  B  entering 

the  cleft  is  immediately  used  up,  there  can  be  no  accumulation  of  B 

and  b  (t)   is  technically  zero.  However,  in  an  infinitesimal  time 
c3 

interval,  dt,  b  (t)dt  of  B  did  enter  the  cleft.  We  must  hypothesize 

an  infinitesimal  accumulation  of  B  in  the  cleft  of: 

db  (t)  =  b  .(t)dt 
c3     c3 

Since  db  (t)  <  c  (t)  at  all  times,  the  amount  of  E  produced  during 
c3      c3 

the  time  interval  dt  is: 

dz  At)   =  db  (t)  =  b  0(t)dt 
°3      c3     c3 


Thus  2  (t)   =  b  0(t). 

The  E  produced  at  any  time  t  >  0.5  is: 

0.5 

=  loe-°-10<  (l  -  .-«<tt-0.5]  + 

For  times  sufficiently  greater  than  t  =  0.5t  the  E  produced  is: 

a      -0.1c* 
z  ~(t  >>  0.5)  =  lOe 

for  <*  =  3.3333.  this  gives  us: 

z  At   >>  0.5)  =  7.2 
c3 

which  agrees  very  well  with  the  experimental  reaults  shown  on  the 

z  ~(t)  traces  in  figure  8.5.2. 
c3 

Since  there  was  more  C  than  B  entering  the  c,3  cleft,  and  since 
C  was  deactivated  at  a  finite  rate,  there  is  an  accumulation  of  C  in 
the  cleft.  The  c  o(t)  trace  shows  this  accumulation  and  its  deactivation. 

The  traces  associated  with  VV  and  V_  are  similar  to  those  associated 
with  V  .  The  only  difference  is  that  V^  and  V^  were  excited  by  input 
impulses  at  progressively  later  times  than  V-, 

The  second  response  shown  on  all  the  traces  is  a  "prediction" 
response.  The  command  cell  body,  V  ,  was  excited  by  an  input  impulse 
alone.  The  spike  so  generated  traveled  doxro  the  axons  to  the  "grid" 
cell  bodies,  V  through  V  .  When  it  arrived,  it  instantly  released  all 
the  transmitter  E  substance  in  the  synaptic  cleft.  Each  of  the  "grid" 
cell  bodies  was  excited  to  a  membrane  potential  of  z  . (t  =  2.2).  In 
this  case,  there  was  no  time  difference  between  the  arrival  of  the 
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spike  at  the  boutons  and  the  excitement  of  the  grid  cell  bodies. 
Both  events  occured  at  t  =  2.2.     Thus  the  amount  of  B  being  releasee] 
into  the  clefts  which  could  react  with  C  to  form  E  was: 

b     (t)  =  C-x  (t  -r)l      =10«e  fort  ^  2.2 

ci  c 

However,  the  amount  of  C  being  released  into  the  clefts  at  the  same 

time  was: 

c  (t)  =  C-x.(t)]+  =  az  .(t  =2.2)«e-<t-2-2]"   for  t  ±  2.2 

ci        a         ca 
In  all  cases,  the  amount  of  C  being  released  vxas  less  than  or  equal  to 

the  amount  of  B  being  released.  Thus: 

i  .(t)  =aZ  .(t  =  2.2)  oce"^"2,23      for  ti  2.2 

CI  C3. 

or: 


n  r"1    -«[t-2.2]+,. 

^(t)=az.(t^2.2)L   «.         dt- 


Jci       ci   .      J  2.2  r    ol  + 

,(t  =  2.2)(1  -  e-^1"2-^   ) 


azci 


for  t  sufficiently  greater  than  t  -  2.2: 

zci(t»2.2)  =  azcl(t  =  2.2) 
which  is  what  the  z  .(t)  traces  in  figure  8.5.2  show.  Note  that  the 
effect  of  a  prediction  excitement  of  the  grid  cell  bodies  is  to  produce 
the  exact  amount  of  E  after  excitement  as  there  was  before  the  excitement. 
In  this  sense,  the  network  is  self-sustaining.  We  can  continue  to  excite 
the  grid  cell  bodies  with  prediction  spikes  for  as  long  as  we  want. 
The  result  will  be  the  same  as  the  prediction  response  shown. 

Because  the  amount  of  B  being  released  was  always  greater  than  or 
equal  to  the  amount  of  C  being  released  during  the  prediction  excitement, 
there  is  no  accumulation  of  C  in  the  clefts.  The  c"ci(t)  traces  are 
therefore  zero  during  the  prediction  excitement. 
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section  8,6     Inhibition  and  an  L  Logic 

We  now  have  a  simplistic  model  of  a  nervous  system  that  is  a 
synthesis  of  some  neurophysiological  facts,  some  assumptions,  and 
embedding  field  theory.  Although  much  thought  went  into  the  modeling 
process,  we  can  not  pretend  the  model  is  accurate.  The  fact  that  the 
model  does  work  is  a  powerful  argument  for  a  deeper  study  of  the 
embedding  field  theoretical  assumptions  concerning  learning  at  the 
microscopic  level  in  living  organisms. 

The  time  was  not  available  for  that  deeper  study.  Shortly, 
we  will  drop  the  neurophysiological  names  that  have  been  attached 
to  the  elements  and  processes  in  this  model  and  consider  it  to  be 
an  embedding  field,  network  only.  Before  we  do  so,  there  is  one  further 
neurophysiological  phenomena  which  occurs  in  nervous  systems.  At  the 
microscopic  level,  inhibition  consists  of  depressing  the  cell  body 
membrane  potential  below  the  resting  potential.  Figure  8,6,1  shows 
a  common  inhibiting  arrangement  in  the  spinal  column  of  vertebrates. 
The  two  large  light  neurons  are  motoneurons.  The  dark  neuron  is  a 
Renshaw  cell.  The  sequence  of  events  shown  on  the  traces  is  as  follows: 
The  cell  body  of  motoneuron  V.  is  excited  by  a  spike.  Its  membrane 
potential  rises  with  an  EPSP  and  a  reflected  spike.  This  spike  is 
propagated  down  V.  ' s  axon.  A  collateral  breaks  off  of  this  axon  and 
synapses  on  the  Renshaw  cell's  body.  Arrival  of  the  spike  at  this 
synapse  excited  the  Renshaw  cell  body  which  fires  a  burst  of  spikes. 
These  spikes  propagate  up  the  Renshaw  cell's  axon.  The  Renshaw  cell's 
axon  breaks  up  into  two  collaterals.  One  synapses  on  the  V  cell  body 
and  another  synapses  on  the  Vp  cell  body.  When  the  burst  of  spikes 
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arrives  at  these  synapses,   inhibitory  transmitter  is  released.     The 
inhibitory  transmitter  causes  a  decrease  in  the  membrane  potentials  of 
V^  and  V     below  resting  potential.     The  membrane  potential  traces  which 
are  below  resting  potential  are  called  inhibitory  post  synaptic  potentials 
or  IPSP's. 

The  important  things  we  want  to  note  from  figure  8,6,1  are: 

(a)  The  Renshaw  cell's  body  membrane  potential  increases  in  the 
positive  direction  when  it  is  excited, 

(b)  The  spikes  propagated  along  the  Renshaw  cell's  axon  are 
similar  to  the  spikes  along  the  motoneuron's  axons.     In  particular 
they  are  increases  in  the  positive  direction  of  the  axon's  membrane 
potential, 

(c)  A  transmitter  substance  is  releasee  by  these  spikes ■      It 
causes  a  decrease  in  the  motoneuron's  coll  body  membrane  potential. 
This  decrease  in  membrane  potential  'does  not  cause  any  change  in  the 
motoneuron's  axon  membrane  potentials. 

These  facts  show  that  there  is  no  negative  membrane  potential 
propagated  anywhere  in  the  system.     All  propagating  signals  are  positive 
signals.     In  the  discussion  of  allowable  prediction  signal  states  in 
section  7.2,  we  did.  not  allow  the  propagation  of  negative  amplitude 
prediction  signals.     We  made  this  restriction  on  the  grounds  of 
consistency  and  the  fact  that  negative  amplitude  prediction  signals 
were  not  needed  in  an  out star.      In  the  nervous  system  of  living 
organisms,  negative  amplitude  "prediction"   signals  do  not  occur.     Thus 
our  restriction  on  the  allowable  states  of  prediction  signals  in  embedding 
field  networks  is  consistent  with  neurophysiological  data. 
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The  inhibitory  transmitter  substance  released  by  the  Renshaw 
cell's  burst  of  spikes  is  considered  to  be  a  chemical  substance  that 
is  different  from  the  excitory  transmitter  which  excites  the  moto- 
neuron's cell  bodies.  There  are  at  least  three  chemical  substances 
which  act  as  transmitter  in  nervous  systems.  They  are  acetycholine, 
epinephrine,  and  norepinephrine.   In  one  part  of  the  body,  and  with  one 
species  of  nouron,  one  of  the  substances  may  act  as  an  excitory 
transmitter  and  another  may  act  as  an  inhibitory  transmitter.  In 
another  part  of  the  body  and  with  another  species  of  neuron,  their 
effects  may  reverse. 

With  these  few  facts  in  mind,  we  will  now  invent  a  simplistic 

model  for  inhibition  which  we  shall  add  to  our  previous  model.  Firstly, 

we  will  postulate  an  inhibitory  transmitter  substance  H  which  is 

different  from  our  excitory  transmitter  substance  E.  Since  the  H  and 

E  may  reverse  their  roles  in  other  parts  of  the  nervous  system,  we 

« 
want  the  processes  for  production  and  release  of  H  to  be  similar  to 

those  for  E,  Therefore  we  will  assume  that  H  substance  is  stored  in 

the  synaptic  cleft.   It  is  released  when  the  adjacent  bouton  membrane 

potential  is  increasing,  i.e.,  when  x(t  ~x)  >   0.  We  will  further 

assume  that  the  release  of  one  mole  of  H  will  result  in  an  instantaneous 

increase  in  the  adjacent  cell  body  membrane  potential  of  tf  volts. 

Note  that  this  is  an  increase  of  tf  volts.  We  have  specified  that  the 

release  of  one  mole  of  E  will  result  in  an  instantaneous  cell  body 

membrane  ptoential  increase  of  a  volts.  By  specifying  a  or  tf  positive 

or  negative,  we  can  specify  their  effects  in  various  parts  of  our 

system.  However,  normally  ~6    will  be  assigned  a  negative  value. 
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We  must  now  invent  a  process  which  will  produce  H  substance  from 
chemical  substances  available  in  the  synaptic  cleft.     To  do  this  we 
will  look  closely  at  the  Renshaw  cell  bouton -motoneuron  synapse  in 
figure  8.6.1,     The  effect  of  H  substance  is  a  decrease  in  the  moto- 
neuron's cell  body  membrane  potential.     This  is  caused  by  an  increase 
in  the  cell  body  membrane's  permeability  to  K+  and  CI-  ions.     With 
the  sodium  pump  \tforking  to  eject  Na+,  the  net  effect  is  an  increase 
of  K+  ions  inside  the  cell  body.     Remember  that  we  allowed  C  substance 
to  be  released  when  the  cell  body  membrane  potential  was  above 
resting  potential,  but  decreasing.     When  the  cell  body  membrane  potential 
is  above  resting  potential  but  decreasing,   K+  ions  are  diffusing 
out  of  the  cell  body.     Thus  we  have  sort  of  tied  the  release  of  C 
substance  to  the  diffusion  of  K+  ions  out  of  the  cell  body.     Now, 
when  the  cell  body's  membrane  potential  is  decreasing  below  resting 
potential,   K+  ions  are  diffusing  into  the  cell  body.     Thus  we  may 
assume  that  no  C  substance  is  being  released  into  the  synaptic  cleft 
when  the  cell  body  membrane  potential  is  decreasing  belox^  rest  potential. 
We  will  make  the  further  assumption  that  no  C  substance  is  released 
at  any  time  when  the  cell  body's  membrane  potential  is  below  resting 
potential.     Thus  C  substance  can  not  be  involved  in  the  production  of 
H  substance.     We  could  postulate  another  chemical   substance  which 
is  released  from  the  cell  body  into  the  synaptic  cleft  when  the  cell 
body  membrane  potential  is  below  resting  potential.     This  is  a  valid 
option ,  but  we  will  not  investigate  it  further. 

Since  the  Renshaw  cells'    spikes  are  the  same  as  all  other  spikes, 

B  substance  is  being  released  from  the  Renshaw  cells'   boutons.     Thus 

B  substance  could  be  a  reactant  in  the  production  of  H,     Suppose  that 
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there  is  a  substance,  S,  which  is  always  present  in  large  quantities 
in  the  synaptic  cleft.  Stippose  farther  thai  a  substance  reacts  with 
B  substance  according  to: 
8.6.1     i-B  +  1-S  ^  1-H 

Suppose  further  that  this  reaction  is  fast,  but  not  as  fast  as 
the  reaction  producing  E  substance.  Then  excitation  of  a  bouton  with 
a  spike  will  release  B  substance.   If  there  is  C  substance  present  in 
the  cleft,  then  [nin(b(t),  c(t)) J   of  E  substance  will  be  produced. 
If  there  is  any  B  left  over  after  this  reaction,  it  will  combine 
with  S  to  form  H.  In  the  experiments  of  section  8,5  we  saw  that  an 
accumulation  of  B  in  the  cleft  is  not  necessary  for  learning.   (The 
accumulation  of  B  in  those  experiments  was  always  zero  because  the 
deactivation  rate  for  B,  w,  ,  was  infinite.)  Further,  if  we  make  this 
postualtion,  then  the  logic  governing  the  performance  of  the  elements 
in  the  network  will  be  an  <k      logic, 

An   £     logic  conforms  to  the  following  tabulation: 


Table  8. 6 J 

X 

x. 

c 

l 

0 

0 

0 

+1 

+1 

0 

+1 

+1 

+1 

-1 

0 

-1 

cf3(xc,  x.)  =  zo± 


0 

0 

-1 

+1 

-1 

0 


In  the  current  context,  this  tabulation  means  that  there  is  no 
transmitter  substance,  E  or  H,  produced  when  the  bouton  membrane  potential 
is  at  resting  potential,  or  x_  =  0,  This  is  independent  of  whatever 
the  adjacent  coll  body  membrane  potential  may  be.  However,  when  the 
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bouton  membrane  potential  is  above  resting  potential,  there  are  three 
cases:  When  the  adjacent  cell  body  membrane  potential  is  at  or  below 
resting  potential,  inhibitory  H  transmitter  substnace  is  formed.  If 
the  adjacent  cell  body  membrane  potential  is  avove  resting  potential, 
but  decreasing,  then  excitory  E  substance  is  formed. 

Thus  the  reaction  l'B  +  !•  S  ^  1*H  accomplishes  one  of  the  stated 
aims  of  this  chapter  -  the  implementation  of  an   X,  logic.  We  will 
therefore  adopt  it  as  the  chemical  reaction  producing  H  substance 
in  the  model. 

The  alert  reader  may  have  noticed  that  we  have  already  accomplished 

the  third  aim  of  this  chapter.  We  have  already  invented  a  process  which 

does  not  cause  "pulse  lengthening"  of  a  signal  being  transmitted  through 

a  neuron  or  embedding  field  node  cascade.  Consider  a  cascade  of  N 

neurons,  V.,  V  ,.,v  ,  The  V.   neuron  synapses  on  the  V.  neuron.  The 
1   2  '  N       j-1  3 

V  neuron  is  the  "starting"  neuron,4  Suppose  that  each  of  the  j-l,j 
synaptic  clefts  in  the  cascade  contains  A  moles  of  E  substance.  Let 
the  E  effectiveness  factor,  a,  be  a  =  1.0.  For  simplicity  let  the  H 
effectiveness  factor,  ~6   ,  be  tf  =  0,  so  that  we  do  not  have  to  worry 
about  inhibition.  Let  the  "starting"  neuron,  V.  ,  be  excited  by  an  input 
impulse  of  amplitude  A  at  time  t,  .  Then: 
f  0  for  t  <  t. 

xi(tH  -t 

1  (Ae     ^  for  t  ^  tj 

This  signal  will  arrive  at  the  1,2  synapse  at  t,   +  T ,      It  will  cause 
the  release  of  all  the  E  substance  present.     Thus: 

f0  for  t  <   t,   +  T 

x2<*>  H         at 

(Ae**^  for  titj+r 


228 


Since  x  (t)  and  x.  (t  -f )  are  identical,  A  moles  of  E  will  be 

produced  in  the  1,2  clef  I  by  the  E  produullon  process  after  t  =  tj  +  T, 

The  same  argument  holds  for  each  pair  of  neurons,  V.  .  ,  V.,  in  the 

cascade.  Thus: 

(0  for  t  <  t.  +  (j-i)r 
x.(t)=i     .      1 
J      (Ae~at  for  t  ±  t  +  (j-l)T 

Except  for  the  time  delay,  (j-l)T  ,  the  signal  is  transmitted  through 
the  cascade  unchanged.  There  is  no  "pulse  lengthening".  Additionally, 
the  self-stistaining  property  of  the  E  production  insures  that  we  can 
propagate  any  number  of  signals  through  the  cascade  without  distortion. 
(Note,  this  last  statement  is  true  only  if  there  is  a  time  interval 
between  consecutive  signals  which  is  large  enough  to  allow  the  E  pro- 
duction process  to  produce  approximately  A  moles  of  E  before  the  next 
signal  is  started,  at  the  "starting"  neuron.  In  practice,  making  this 
interval  3/&  seconds  is  sufficient.*) 

The  reason  that  such  a  cascade  does  not  distort  a  signal  is  simple. 
The  input  signal  to  the  "starting"  neuron  is  an  impluse.  The  "prediction' 
input  signal  to  all  the  cell  bodies  in  the  cascade  is  also  an  impulse. 
This  is  because  the  effect  of  the  release  of  A  moles  of  E  in  a  synaptic 
cleft  is  an  instantaneous  increase  of  the  adjacent  cell  body  membrane 
potential  of  A  volts.  The  effect  of  an  input  impulse  is  an  instantaneous 
increase  of  the  cell  body  membrane  potential  of  A  volts.  Thus  the 
effect  of  an  input  impulse  and  a  "prediction"  excitement  are  the  same. 

Having  modeled  an  arbitrary  mechanism,  we  will  now  drop  the 

neurophysiological  names  assigned  to  the  elements  and  processes  of  the 

nodes  and  replacd  them  with  embedding  field  names.  To  do  so,  we  must 

add  a  "synaptic  cleft"  between  the  arrowheads  of  the  embedding  field 
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theory  and  the  adjacent  node.  This  is  added  to  give  us  a  definite 

place  for  the  chemical  reactions  we  have  invented  to  occur,  .  We  will 

denote  the  synaptic  cleft  between  the  N..  arrowhead  and  the  V.  node 

by  S  -  ,  Because  cur  model  works  according  to  chemical  reactions,  we 

will  call  a  network  composed  of  elements  from  this  model  a  cherrn cal 

embedding  field  network.  We  list  here  a  complete  description  of  the 

processes.  There  are  several  new  variables  in  the  following  equations. 

They  are  defined  after  the  equations. 

Equations  for  the  chemical  embedding  field  network  processes: 

8.6.2     x  (t)  =  -wx.(t)  +  P.(t)  +  aSR(p.(t  -T))z..(t)  + 
i         i  ■     i      j    J        Ji 

»?R(p.(t  -r))h.  (t) 

J    0        Ji 

where  R(p  (t  -T))y(t)  is  a  special  function  defined  by: 


R(p.(t  -r))y(t)  =  < 


(o  if  p  (t  -r)  -  o 

r3 


an  impulse  of  amplitude  y(t)  when  p  .(t-  ?  )  >  0 


+  , 


8.6.3  p.(t  -r)  =  [x.(t  -r)] 

8.6.4  b     (t)  =  [miu(b     (t),   o     (t))|(+  -  R(p,(t  -r))z     (t) 

where : 

+       f x  if  x  -  y  and  x  >  0 

[min(x,  y)]       =  •!  y  if  y  ±    x  and  y  >   0 

1^0  if  x  <    0  or  y  <   0  - 

8.6.5  h..(t)  =tb     (t)  -[min(b.   (t),   c  ..(t)]+]+  -  R(p  .(t- r))h  ..(t) 

ji  ji  Ji  ox  j  ji 

8.6.6  b..(t)  =  [-P,(t  -r)]+  -  [min(b\,(t),  c..(t))]  + 
where : 


[yf  = 


'y  if  y  >  o 

k0  if  y  t   0 


8.6.7  c,,(t)  =  [-[x.(t)]']    -w[c..(t)-b.(t)]     - 


[minCb^Ct),   c',i(t)]  + 


0 
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Definition  of  the  variables: 

x.(t)  is  the  conventional  x  process  which  occurs  at  node  V., 
l  r  1 

p.(t  -  T)  is  the  prediction  signal  at  the  arrowheads  connected 

to  node  V.  by  directed  edges.  Since  we  do  not  allow  negative  amplitudes 

_  j. 
for  prediction  signals,  p.(t  -f)  =  [x.(t  -t)J  .  Only  the  first 

derivative,  p..(t  -  t)   is  used  in  the  above  equations. 

P.(t)  is  the  conventional  event  input  impulse.  In  this  study  of 
the  chemical  embedding  field  networks,  P.(t)  will  be  constrained  to 
be  an  impulse  of  amplitude  A, 

z..(t)  is  the  amount  of  excitory  transmitter  substance,  E,  in  the 
S...  synaptic  cleft. 

h..(t)  is  the  amount  of  inhibitory  transmitter  substance,  H,  in 

the  S..  synaptic  cleft. 

b,.(t)  is  the  amount  of  B  substance  in  the  S..  synaptic  cleft, 
ji  01 

c..(t)  is  the  amount  of  C  substance  in  the  S..  synaptic  cleft. 

4 

Definition  of  the  constants: 

&,   is  the  decay  rate  for  x  processes* 

a  is  the  effectiveness  factor  for  E  substance.  Release  of  1  unit 
of  E  in  the  synaptic  cleft  will  result  in   an  instantaneous  increase 
in  the  amplitude  of  the  adjacent  nodes'  x  process  of  a. 

#  is  the  effectiveness  factor  for  H  substance.  Release  of  1  unit 
of  H  in  the  synaptic  cleft  will  result  in  an  instantaneous  increase  in 
the  amplitude  of  the  adjacent  nodes'  x  process  of  tf  .   tf  will  have 
negative  values  throughout  the  rest  of  this  study. 

t  is  the  transmission  delay  due  to  finite  transmission  velocities 
on  directed  edges.  A  signal  which  originates  at  the  V.  node  at  time  t. 
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will  arrive  at  the  arrowheads  connected  to  this  node  at  time  t  +  t  . 

i 

w  is  the  rate  constant  for  deactivation  of  C  substance. 
c 

Discussion: 

Equations  8.6.2  through  8.6.7  are  a  mathematical  description  of 
the  processes  we  have  invented  in  this  chapter.  They  are  different 

from  equations  8.4.15  through  8.4.18  because  they  include  the  addition 
of  the  inhibitory  processes. 

The  functions  a£R(p(t  --r))z.,(t)  and  *2R(p,(t  -r))h..(t) 
in  equation  8.6.2  say  that  when  the  prediction  signal  p.(t  -f ) 
arriving  at  the  N..  arrowhead  is  increasing,  all  the  E  and  H  substances 
in  the  S  . .  synaptic  cleft  is  released  instantly.  The  release  of  these 
substances  at  time  t-  causes  an  instant  increase  in  the  amplitude  of 
the  adjacent  x.(t)  process  of  az..(t„)  +  tfh..(t~), 

E  substance  is  produced  in  the  synaptic  cleft  according  to  the 
instantaneous  reactions  , 

1«B  +  1«C  ^1-E 

Because  the  unit  cooficients  in  this  equation,  the  maximum  amount  of  E 

that  can  be  produced  at  any  time  is  the  minimum  of  the  reactants 

available.  Equation  8.6,6  says  that  the  amount  of  B  being  released 

into  the  S..  cleft  per  second  is  [-p.(t  -t)~\    .  That  is,  the  amount 

of  B  being  released  from  the  M.  .  arrowhead  into  the  S   cleft  is 

*J  ji 

directly  proportional  to  the  decrease  per  second  in  the  amplitude  of 

the  prediction  signal  at  the  N..  arrowhead.  The  B  substance  thus  released 

r  °  Ji 

is  first  mado  available  for  reaction  with  C  to  form  E,  If  there  is 
any  B  left  over  after  this  reaction,  it  reacts  with  S  substance  to  form 
H  substance,  S  substance  is  always  present  in  large  quantities  in  the 

23Z 


cloft.   Equation  8.6,5  says  this  mathematically, 

The  amount  of  C  released  into  the  S<#  cloft  per  second  is  directly 
proportional  to  the  decrease  per  second  in  the  amplitude  of  the  adjacent 

V.  nodes'  x  process,  provided  that  x  process  is  positive.  The- term 

1   '- +  1  + 

[-  [x.(t)J   J   in  equation  8,6,7  says  this.  The  amount  of  C  present 

in  the  cleft  is  first  made  available  to  react  with  any  B  present  to 

form  E,  If  there  is  any  C  left  over  after  this  reaction,  it  is 

deactivated  at  rate  v   .  Equation  8,6,7  states  this  mathematically. 

Although  equations  8,6,3  through  8,6,7  are  complicated  and 

describe  a  complicated  set  of  simultaneous  processes,  they  are  fairly 

straight  forward  to  simulate  on  a  digital  computer.  In  the  next 

section,  we  shall  study  an  out star  network  governed  by  these  equations. 
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section  8,7     A  Chemical  Out star 

An  outstar  composed  of  chemical  embedding  field  elements  was 
set  up.  The  standard  experiment  that  has  been  performed  in  the  other 
outstars  studied  was  performed.  The  events  inputs  to  the  nodes  were 
specified  to  be  impulses  of  amplitude  A  =  10.   From  equations  8.6,2 
through  8.6,7 »  there  are  five  network  parameters  to  be  specified: 
ol  ,  1*  ,  a,  "Jj  ,  and  w  .  <x  and  t  were  specified  as  in  the  past: 

*  =  3.333  sec."1 

t  =  0.3  sec. 

The  deactivation  rate  for  C  substance,  w   ,  was  arbitrarily  specified 
to  be: 

w     =  0.5  sec." 

Since  an  excitory  transmitter  (E)  substance  effectiveness  factor 
of  a  =  1.0  has  resulted  in  self-sustaining  systems  in  the  past,  a  was 
specified  to  be:  ♦ 

a  =  1.0 

The  specification  of  the  new  inhibitory  transmitter  (H)  substance 
effectiveness  factor,  tf  ,  will  require  some  discussion.  The  chemical 
outstar  conforms  to  logic  <\^  tabulated  in  table  8.6.1,  The  three 
assignments  in  that  table  which  can  cause  the  "z"  processes  to  be 
driven  to  non  ambient  states  are: 

8.7.1  £  Ax    =  +1,  x  =  +1)  =  +1 

3  c       i 

8.7.2  j?  (x  =  +1,  x±   =  0)  =  -1 

8.7.3  jCo(x  -  +1,  x.  =  -1)  =  -1 

In  the  current  context,  8.7.1  says  that  an  excited  prediction  signal 
at  an  arroi-jhead  and  an  excited  x  process  at  the  adjacent  node  results 


in  the  production  of  E  substance.  This  is  equivalent  to  driving  a 
"z"  process  in  the  excitory  direction.  The  other  t\-ro   assignments  say 
that  when  the  prediction  signal  is  in  an  excited  state  and  the  adjacent 
node  x  process  is  in  an  ambient  or  inhibited  state,  H  substance  will  be 
produced.  This  is  equivalent  to  driving  a  conventional  "z"  process 
in  the  inhibitory  direction.  In  the  last  chapter,  we  introduced  the 
idea  that  an  outstar  may  have  its  grid  nad  command  nodes  randomly 
excited  before  it  "goes  to  school"  to  learn  a  pattern.  According  to 
the  Ao  logic,  random  excitement  of  the  grid  nodes  can  not  change  the 
"z"  process  state.  However,  random  excitement  of  the  command  node 
can  result  in  the  outstar  learning  to  directly  inhibit  all  the  grid 
nodes  according  to  assignments  8.7.2  and  8,7. 3»  In  &  real  environment 
we  can  expect  this  to  be  the  case  before  the  outstar  "goes  to  school". 
Thus  the  outstar  will  be  inhibit orally  biased  before  we  try  to  teach 
it  a  pattern.  We  must  insure  that  this  inhibitory  biasing  is  not  so 

A 

great  as  to  prevent  the  outstar  from  learning  a  pattern. 

To  facilitate  this  discussion,  we  will  prove  the  following  lemma: 
Lemma  8. 7.1 

Let  a  node  V.  have  an  arrowhead  N.   impinging  on  another  node  V  , 

Let  the  fundtions  z.At)   =  h.?(t)  =  0.  Let  node  V.  be  excited  by  a 

positive  impulse  of  amplitude  A.,  at  time  t,  .  Let  node  V  be  excited  at 

time  t  =  t.  +"f  by  an  input  impulse  of  amplitude  A  which  may  be 

negative.  Then  the  amount  of  H  substance  in  the  S   synaptic  cleft 

at  times  t>>t.,+1f  =  t  is : 
1       2 

fo  if  0  <  A    <   A 
h,0(t»tj  HVA?  if  01<A/<  A. 
1Z  Z         |A^   if  A2  t  0  1 
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And  tho  amount  of  E  substance  in  the  S   synaptic  cleft  at  times 

12 

t   >>  t     +    T   =  t     is: 

„  [A1   if  0  <  A1  <   A2 
z1?(t  >>  t9)  =i  A2  if  0  <  A2  <  A± 
*         I  0  jf  A2  -  0 


Proof: 


The  input  impulse  to  node  V.  results  in  a  prediction  signal  arriving 


at  the  arrowhead  at  time  t..  +  "£" 
0  for  t  <  t„ 


t0  which  is: 


lAie-ot[t-t2]  for  t±    t2 

The  input  impulse  to  node  V     results  in  an  x2(t)  process: 
f  0  for  t  <  t2 

The  amount  of  B  substance  released  into  S^  is: 

12 


•:- 


bl2(t)  =   L-pi2(t)]   =J 


0  for  t  <  t, 


fo*  t<t. 


Axo(  e"*^11""^  '   for  t  i  t? 
The  amount  of  C  substance  reseased  into  S, 


12 


is: 


.  !  +      fo  for  t<   t2 

cio(t)  =[-[x,(t)l  +  ]     =  *(o  for  f^  to  if  A2-  0 
12  2  [igde-^tt^foJti 

Thus  the  amount  of  E  being  formed  is: 

z     (t)  =  f*     [minCb^Ct),   c12(t))]  +dt 


0  for  t  ^    t£  if  A     =  0 

=  4  A9(l   -  e'"11'1^  "  for  t  ±  t,  if  0  <  A0  <  A, 

£  ^21 

Ax(l  -  e"6^""^  "    for  t  i  t2  if  0  <  A±  <  A 
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Thus  for  t   >>  tg: 


f  0  if  A2  '-   0 


zl2(t  >>  t   )  =' 


2 


A,  if  0  <  A2  <  Al 
I A1   if  0  <  A1  <  A2 


The  amount  of  H  being  formed  is  equal  to  the  amount  of  B  left  over 
after  the  E  production  reaction. 

h12(t)  =  St    ^I2(t)   -tmdn(^l2(t)»   elzMt]-to 

0  if  A,  <   A^     for  all  t 
1         2 

=(A1   -  A2)(l   -  e"0^-^  +)   if  0  <  A    <   kt     for  t  2  tg 


^A 


Thus  for  t  >>  t   : 


(1   -  e-«tt-t23  +  )   if  A    ±    o     for  t^t, 
1  2  t 


0  if  JL  <  A2 


h12(t  "V  =1A1    "  A2  if  °  <  A2  <  Al 
Aj  if  A2 1   0 

Note  that  by  lemma  8.7.1 f  h.    (t>t2)  +  z.. ?(t  >  t2)  =  A..     Also 
note  that  immediately  after  arrival  of  a  prediction  signal  at  the  N^« 
arrowhead,  h,  ?(t)  =  z.?(t)  =  0.     Thus  lemma  8.7»1  applies  to  the  situations 
where  there  is  H  and/or  E  substance  present  in  S,2  before  arrival  of 
a  prediction  signal.     Since  arrival  of  a  prediction  signal  causes 
the  equivalent  of  input  impulses  of  amplitudes  az.  At)  and    ^h.    (t) 
to  be  delivered  to  V   ,  this  lemma  can  be  used  in  all  cases  by  setting 
A?  =  az1?(t)  +  tf  h.?(t)  +  A   ,  where  A.    is  the  amplitude  of  an  external 
input  impulse,   if  any. 

Now,   suppose  we  start  our  outstar  in  a  state  of  initial  ignorance. 
That  is,   zc-(0)  =  h   .(0)   =0.     Let  0   >    "#  >    -1.     We  then  excite  the 
command  node  with  an  input  impulse  of  amplitude  A  without  exciting  the 
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grid  nodes.  By  lemma  8.7.1,  h  .(t)  -  A  and  z     (t)  =0.  Suppose  we 
excite  the  command  node  again  without  exciting  the  grid  nodes.  VJhen 
the  prediction  signal  arrives  at  the  N  .  arrowheads,  all  the  transmitter 
substance  is  released.  Thus  the  grid  nodes  are  excited  by  impulses 
of  amplitude  3  A  <  0.  Then  by  lemma  8.7.1,  A  units  of  H  will  be  produced 
in  the  synaptic  clefts,  Sci.  Thus,  before  the  out star  "goes  to  school", 
the  synaptic  clefts  contain  A  units  of  H  and  0  units  of  E. 

Now  let  the  outstar  "go  to  school".  The  command  node  and  the  grid 
nodes  are  excited  with  input  impulses  of  amplitude  A.  The  command 
prediction  signal  will  cause  a  further  impulse  of  amplitude  tk   <  0 
to  excite  each  of  the  grid  nodes.  Thus  the  grid  nodes  will  be  excited 
by  a  total  input  of  A(l  +*  ).  Thus  A(l  +0  of  E  will  be  produced. 
(Remember  that  0  >  V  >  -1) 

Suppose  we  want  the  outstar  to  be  able  to  directly  inhibit  a  single 
occurance  of  a  random  mistake.  To  the  outstar,  the  first  presentation 
of  the  pattern  after  going  to  school  is  -considered  a  random  mistake. 
Thus  we  want  as  much  of  H  produced  as  E.  On  this  criteria,  tf  =  -0.5 
is  specified.  Now,  let  us  present  the  pattern  a  second  time.  The  total 
input  impulse  amplitude  to  the  grid  nodes  in  the  pattern  will  be 
A(  1  +  *  )  +  AS  +  A  =  0.5A  -  0.5A  +  A  =  A.  Thus  A  units  of  E  will  be 
produced  on  the  second  presentation  of  the  pattern.   0  unnts  of  H  will 
be  produced. 

Thus  by  specifying  tf.  =  -0.5,  we  will  have  an  outstar  that  is 
resistant  to  single  occurances  of  random  mistakes,  but  will  learn 
a  pattern  well  in  two  presentations.  Therefore,  for  the  experiment, 
tf  is  specified  to  be: 

*  B  -0.5 
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Figure  8.7.1  shows  the  results  of  the  first  part  of  the  experiment. 
The  command  node  is  excited  once  alone  at  the  beginning  of  the 
experiment.  The  z  traces  show  that  no  E  was  produced  in  the  synaptic 
clefts.  The  h  traces  show  that  10  units  of  H  was  produced  in  the  clefts. 
Thus  the  outstar  is  inhibitorally  biased  before  "going  to  school". 
"School"  begins  with  the  second  command  node  excitement.  Event  2  is 
presented  exactly  when  the  command  prediction  signal  arrives  at  the 
armwheads.  Event  3  is  presented  2/<*  =  0.6  seconds  later.  The 
pattern  is  presented  twice. 

In  both  presentations  of  the  pattern,  significantly  more  H  is 
produced  in  the  S   cleft  than  E.  Since  event  1  is  not  presented, 
10  units  of  H  are  produced  in  the  S  ,  cleft.  No  E  is  produced  in  the 
S  a    cleft.  On  the  first  presentation  of  the  pattern,  the  amount  of 
E  and  H  produced  in  the  S  «  cleft  approximately  balance.  On  the  second 
presentation  of  the  pattern,  10  units  of  E  are  produced  in  the  S  ? 
cleft  and  no  H  is  produced. 

The  fourth  excitement  of  the  command  nodes  results  in  a  prediction 
excitement  of  the  grid.  The  third  response  on  the  grid  x  traces  in 
this  prediction  excitement  of  the  grid.  From  the  results  we  can  conclude 
that  the  outstar  has  learned  the  pattern  V  — *-  Vg.  It  has  also  learned 
to  directly  inhibit  grid  nodes  V.  and  V  . 

The  experiment  was  continued  to  test  the  random  mistake  in  the 

previously  learned  pattern  V — >  V  ,  Figure  8,7.2  shows  the  results. 

The  direct  inhibition  of  V  which  the  outstar  had  previously  learned 

caused  x,(t)  to  rise  to  a  value  of  only  5.   (The  input  impulse,  P*(t) 

has  an  amplitude  of  10.)  The  amounts  of  H  and  E  produced  in  the  S 

cleft  approximately  balance.  Thus  when  a  prediction  is  excited  by  the 
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Figure  8.7.2.  Resistance  to  random  mistakes  in  a  chemical  outstar. 
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Figure  8.7.3.   An  unsucessful  attempt  to  correct  a  previously  learned 
pattern  in  a  chemical  outstar. 


second  excitement  of  the  command  node,  x.(t)  rises  to  only  a  slight 
positive  value.  The  amount  of  H  produced  in  S  ,  during  the  prediction 
excitement  is  considerably  more  than  the  E  produced.  Further  prediction 
excitements  will  result  in  inhibited  amplitudes  for  Xi(t).  We-  may 
conclude  that  the  outstar  has  good  resistnace  to  random  mistakes. 

The  experiment  was  continued  to  test  the  correctability  of  the 
outstar.  The  correcting  pattern  V  — >-V9  was  presented  twice.  Figure 
8,7.3  shows  the  results.  Although  the  outstar  did  leam  the  pattern 
V— *~Vo»  it  did  not  "unlearn"  the  previously  learned  pattern  V  — -^-v^. 
There  is  no  "forgetting  rate"  in  the  chemical  outstar.  Thus  the  old 
pattern  can  not  be  forgotten.  There  is  also  no  lateral  inhibition 
in  this  outstar.  Thus,  this  chemical  outstar  lacks  the  two  mechanisms 
whereby  previously  learned  patterns  can  be  removed  from  its  memory. 

This  is  a  major  drawback  in  this  outstar.  Further  work  with  it  would 

< 
require  investigations  of  the  effects  of  a  finite  forgetting  rate  for 

the  E  and  H  substances  in  the  synaptic  clefts.  Additionally,  the  effects 

of  lateral  inhibition  should  be  investigated. 
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APPENDIX  A 

The  Digital  Simulation  and   its  Accuracy 

The  equations  which  were  simulated  in  this  thesis  were  simul- 
taneous nonlinear  differential  difference  equations.     They  fell  into 
three  basic  types: 

A.i  x(t)  =  -  ctx(t)  +  Ix(t) 

A. 2  y(t)  =  -ocy(t)  +  2(t)x(t  -X)  +  Iy(t) 

A. 3     z(t)  =  -uz(t)  +  y(t)x(t  -tr) 
Figure  A.I  shows  a  system  flow  diagram  for  this  set  of  equations. 

The  key  to  the  digital  simulation  is  the  algorithm  used  for  the 
integrators.  This  thesis  used  a  simple  Euler  rule  algorithm.  That 
is,  the  integral: 

r(t)  =  jr(t)dt 
was  simulated  by  the  algebraic  equation: 

r(t  +  h)  =  r(t)  +  r(t)h 
where  h  is  the  digital  increment. 

The  Euler  rule  algorithm  was  adopted  because  it  is  easy  to  program 
on  a  high  speed  difital  computer  and  the  computations  require  compara- 
tively little  computations.  The  large  number  of  experiments  simulated 
in  this  thesis  required  efficient  use  of  computation  time.  Most  of  the 
experiments  involved  at  least  seven  variables  and  required  over  fifty 
increments.  Thus  the  simplest  and  fastest  integration  algorithm  was 

selected. 

The  sampled  data  "z"  transforms  for  the  equations 

A.4     x(t)  =  -otx(t)  +  Ix(t) 
using  an  Euler  rule  integration  algoritr,,  is: 
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Figure     A.l.        A  signal  flow  diagram  for  the 
simulation. 
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x  (z)  =(h/(z  -  1  +  ha  ))l  (z) 


for: 


Ix(t)  =  u^Ct) 


where : 


u^Ct)  =- 


0  for  t  <  0. 

1  for  tiO 


X.    C  Z  )     is  * 

A  h      z  1/ha       (i  -  l/(hd))z 

x*(z)=~( +     ) 

z       z-i  z  -  I  +  he* 

The  time  varying  function  which  this  transforms  to  is: 

x*(t)  =  (l/CcOu^ (t  -  h)  +  (h  -  l/(oO)c-*(t"hb  for  ti  0 

where : 

y   =  -l/(h)ln(l   -  h*) 

The  continuous  solution  to  A,h   when  Ix(t)  =  u  ,  (t)  is: 


-1 


x(t)  Kl/*)(l  -  e_0Ct)  for  t-  0 

i 

For  t  -  h,  the  ratio: 
A. 5 


x*(t) 


ahe-^t-h) 


=  1  + 


f or  t  ±  h 


x  (t  -  h)        (1  -  e-«^-) 
computed  at  (t  -  h)  =  l/<X   was  used  to  check  the  accuracy  of  the  amplitudes 
of  the  digitally  simulated  function  x  (t).  The  ratio  0/a  "was  used  to 
check  the  accuracy  of  the  simulated  decay  rate,o'  .  The  two  most 
frequently  used  choices  for  oc  and  h  in  this  study  were: 

«  =  3.3333,  h  =  0.1 

and: 

a.  =  1.6666,     h  =  0.1 
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The  following  table  shows  the  accuracy  of  the  simulation  to  a 
step  input: 

*/.N  .   -*(t-h) 

x  (t)  cthe 

*h  t/d   =  -(i/dh)ln(l-«h)j      =  1  + -yr+'uN 

x(t)        (1  -  e~*Ct"W 

0.3333  1.170  1.163 

0.166666  1.097  I.O87 

Since  all  of  the  input  pulses  used  in  the  study  were  of  duration 
£  =  l/a  1  the  response  of  the  x  processes  to  input  impulses  is  in  error 
by  at  most  17$.  The  simulated  decay  rates  are  in  error  by  at  most 
17$  also. 

No  attempt  was  made  to  analytically  compute  the  error  in  the 
simulated  response  of  the  x  and  z  processes  to  non  linear  inputs. 
The  results  were  self-consistent  and  agreed  qualitatively  with 
Grossberg's  theoretical  predictions.  Throughout  this  study  a 
qualitative  feel  for  the  networks  studied  and  the  parameters  involved 
in  them  was  the  primary  concern.  As  long  as  the  simulation  agreed 
qualitatively  with  theoretical  expectations,  little  concern  was  given 
to  the  possibility  of  up  to  2O70  amplitude  errors  in  the  computations. 

The  computations  order  and  actual  equations  used  to  simulate 
equations  A.l  through  A. 3  were: 
A. 5  x(t  +  h)  =  x(t)  +  (Ix(t)  -«x(t))h 
A. 6  y(t  +  h)  =  y(t)  +  (I  (t)  -ocy(t)  +  z(t)x(t  -1f))h 
A. 7  z(t  +  h)  =  z(t)  +  (y(t  +  h)x(t  +  h  -  r))h 
where  t  =  hn;  where  n  is  an  integer. 

t   was  always  chosen  to  be  an  integer  multiple  of  h.  The  sequence 
A,5t  A. 6,  A. 7  was  computed  and  then  started  again  with  A. 5  for  the  next 
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incrementation.  Thus  the  values  for  z(t)  in  A. 6  were  effectively- 
delayed  by  ht 

The  digital  computer  used  for  the  simulations  reported  was 
Digital  Equipment  Corporation  PDP/9  witb32K  of  core  memory.  The 
programs  used  were  programed  in  the  Digital  Equipment  Corporation's 
interpretive  language  FOCAL.  The  choice  to  use  FOCAL  was  made  because 
FOCAL  allows  the  dimensions  of  matrices  to  be  a  variable  that  csn  be 
specified  at  run  time.  The  programs  used  stored  the  value  of  each  of 
the  variables  being  computed  after  each  incrementation.  The  stored 
values  were  outputed  at  the  end  of  each  run.  Since  the  number  of 
variables  and  the  number  of  incrementations  per  run  varied  consider- 
ably, the  ability  to  specify  matrix  dimensions  in  the  programs 
immediately  before  the  run  was  a  great  advantage. 

The  minimum  accuracy  in  calculations  performed  by  FOCAL  is  six 
digits.  Since  the  sampled  data  error  was  on  the  order  of  t?%j  six 
digits  computation  error  was  entirely  sufficient. 
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